Neural Modular Control for Embodied Question Answering

Abstract

We present a modular approach for learning policies for navigation over long planning horizons from language input. Our hierarchical policy operates at multiple timescales, where the higher-level master policy proposes subgoals to be executed by specialized sub-policies. Our choice of subgoals is compositional and semantic, i.e. they can be sequentially combined in arbitrary orderings, and assume human-interpretable descriptions (e.g. 'exit room', 'find kitchen', 'find refrigerator', etc.). We use imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning. Independent reinforcement learning at each level of hierarchy enables sub-policies to adapt to consequences of their actions and recover from errors. Subsequent joint hierarchical training enables the master policy to adapt to the sub-policies. On the challenging EQA (Das et al., 2018) benchmark in House3D (Wu et al., 2018), requiring navigating diverse realistic indoor environments, our approach outperforms prior work by a significant margin, both in terms of navigation and question answering.

Cite

Text

Das et al. "Neural Modular Control for Embodied Question Answering." Conference on Robot Learning, 2018.

Markdown

[Das et al. "Neural Modular Control for Embodied Question Answering." Conference on Robot Learning, 2018.](https://mlanthology.org/corl/2018/das2018corl-neural/)

BibTeX

@inproceedings{das2018corl-neural,
  title     = {{Neural Modular Control for Embodied Question Answering}},
  author    = {Das, Abhishek and Gkioxari, Georgia and Lee, Stefan and Parikh, Devi and Batra, Dhruv},
  booktitle = {Conference on Robot Learning},
  year      = {2018},
  pages     = {53-62},
  url       = {https://mlanthology.org/corl/2018/das2018corl-neural/}
}