MOPA: Modular Object Navigation with PointGoal Agents

Abstract

We propose a simple but effective modular approach MOPA (Modular ObjectNav with PointGoal agents) to systematically investigate the inherent modularity of the object navigation task in Embodied AI. MOPA consists of four modules: (a) an object detection module trained to identify objects from RGB images, (b) a map building module to build a semantic map of the observed objects, (c) an exploration module enabling the agent to explore the environment, and (d) a navigation module to move to identified target objects. We show that we can effectively reuse a pretrained PointGoal agent as the navigation model instead of learning to navigate from scratch, thus saving time and compute. We also compare various exploration strategies for MOPA and find that a simple uniform strategy significantly outperforms more advanced exploration methods.

Cite

Text

Raychaudhuri et al. "MOPA: Modular Object Navigation with PointGoal Agents." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Raychaudhuri et al. "MOPA: Modular Object Navigation with PointGoal Agents." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/raychaudhuri2024wacv-mopa/)

BibTeX

@inproceedings{raychaudhuri2024wacv-mopa,
  title     = {{MOPA: Modular Object Navigation with PointGoal Agents}},
  author    = {Raychaudhuri, Sonia and Campari, Tommaso and Jain, Unnat and Savva, Manolis and Chang, Angel X.},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {5763-5773},
  url       = {https://mlanthology.org/wacv/2024/raychaudhuri2024wacv-mopa/}
}