Monte-Carlo Planning and Learning with Language Action Value Estimates
Abstract
Interactive Fiction (IF) games provide a useful testbed for language-based reinforcement learning agents, posing significant challenges of natural language understanding, commonsense reasoning, and non-myopic planning in the combinatorial search space. Agents based on standard planning algorithms struggle to play IF games due to the massive search space of language actions. Thus, language-grounded planning is a key ability of such agents, since inferring the consequence of language action based on semantic understanding can drastically improve search. In this paper, we introduce Monte-Carlo planning with Language Action Value Estimates (MC-LAVE) that combines a Monte-Carlo tree search with language-driven exploration. MC-LAVE invests more search effort into semantically promising language actions using locally optimistic language value estimates, yielding a significant reduction in the effective search space of language actions. We then present a reinforcement learning approach via MC-LAVE, which alternates between MC-LAVE planning and supervised learning of the self-generated language actions. In the experiments, we demonstrate that our method achieves new high scores in various IF games.
Cite
Text
Jang et al. "Monte-Carlo Planning and Learning with Language Action Value Estimates." International Conference on Learning Representations, 2021.Markdown
[Jang et al. "Monte-Carlo Planning and Learning with Language Action Value Estimates." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/jang2021iclr-montecarlo/)BibTeX
@inproceedings{jang2021iclr-montecarlo,
title = {{Monte-Carlo Planning and Learning with Language Action Value Estimates}},
author = {Jang, Youngsoo and Seo, Seokin and Lee, Jongmin and Kim, Kee-Eung},
booktitle = {International Conference on Learning Representations},
year = {2021},
url = {https://mlanthology.org/iclr/2021/jang2021iclr-montecarlo/}
}