Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents

Abstract

We investigate the task of learning to interpret natural language instructions by jointly reasoning with visual observations and language inputs. Unlike current methods which start with learning from demonstrations (LfD) and then use reinforcement learning (RL) to fine-tune the model parameters, we propose a novel policy optimization algorithm which can dynamically schedule demonstration learning and RL. The proposed training paradigm provides efficient exploration and generalization beyond existing methods. Comparing to existing ensemble models, the best single model based on our proposed method tremendously decreases the execution error by 55% on a block-world environment. To further illustrate the exploration strategy of our RL algorithm, our paper includes systematic studies on the evolution of policy entropy during training.

Cite

Text

Xiong et al. "Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents." International Joint Conference on Artificial Intelligence, 2018. doi:10.24963/IJCAI.2018/626

Markdown

[Xiong et al. "Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents." International Joint Conference on Artificial Intelligence, 2018.](https://mlanthology.org/ijcai/2018/xiong2018ijcai-scheduled/) doi:10.24963/IJCAI.2018/626

BibTeX

@inproceedings{xiong2018ijcai-scheduled,
  title     = {{Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents}},
  author    = {Xiong, Wenhan and Guo, Xiaoxiao and Yu, Mo and Chang, Shiyu and Zhou, Bowen and Wang, William Yang},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {4503-4509},
  doi       = {10.24963/IJCAI.2018/626},
  url       = {https://mlanthology.org/ijcai/2018/xiong2018ijcai-scheduled/}
}