LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Abstract
Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require expensive supervision, in the form of trajectories annotated with language descriptions. We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories, while still providing a high-level interface to the user. Instead of utilizing a labeled instruction following dataset, we show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data. LM-Nav extracts landmarks names from an instruction, grounds them in the world via the image-language model, and then reaches them via the (vision-only) navigation model. We instantiate LM-Nav on a real-world mobile robot and demonstrate long-horizon navigation through complex, outdoor environments from natural language instructions.
Cite
Text
Shah et al. "LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action." Conference on Robot Learning, 2022.Markdown
[Shah et al. "LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action." Conference on Robot Learning, 2022.](https://mlanthology.org/corl/2022/shah2022corl-lmnav/)BibTeX
@inproceedings{shah2022corl-lmnav,
title = {{LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action}},
author = {Shah, Dhruv and Osiński, Błażej and Ichter, Brian and Levine, Sergey},
booktitle = {Conference on Robot Learning},
year = {2022},
pages = {492-504},
volume = {205},
url = {https://mlanthology.org/corl/2022/shah2022corl-lmnav/}
}