Navigating to Objects Specified by Images

Abstract

Images are a convenient way to specify which particular object instance an embodied agent should navigate to. Solving this task requires semantic visual reasoning and exploration of unknown environments. We present a system that can perform this task in both simulation and the real world. Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation. We re-identify the goal instance in egocentric vision using feature-matching and localize the goal instance by projecting matched features to a map. Each sub-task is solved using off-the-shelf components requiring zero fine-tuning. On the HM3D InstanceImageNav benchmark, this system outperforms a baseline end-to-end RL policy 7x and outperforms a state-of-the-art ImageNav model 2.3x (56% vs. 25% success). We deploy this system to a mobile robot platform and demonstrate effective performance in the real world, achieving an 88% success rate across a home and an office environment.

Cite

Text

Krantz et al. "Navigating to Objects Specified by Images." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01002

Markdown

[Krantz et al. "Navigating to Objects Specified by Images." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/krantz2023iccv-navigating/) doi:10.1109/ICCV51070.2023.01002

BibTeX

@inproceedings{krantz2023iccv-navigating,
  title     = {{Navigating to Objects Specified by Images}},
  author    = {Krantz, Jacob and Gervet, Theophile and Yadav, Karmesh and Wang, Austin and Paxton, Chris and Mottaghi, Roozbeh and Batra, Dhruv and Malik, Jitendra and Lee, Stefan and Chaplot, Devendra Singh},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {10916-10925},
  doi       = {10.1109/ICCV51070.2023.01002},
  url       = {https://mlanthology.org/iccv/2023/krantz2023iccv-navigating/}
}