Visual Pre-Training for Navigation: What Can We Learn from Noise?

Abstract

In visual navigation, one powerful paradigm is to predict actions from observations directly. Training such an end-to-end system allows representations that are useful for downstream tasks to emerge automatically. However, the lack of inductive bias makes this system data-hungry. We hypothesize a sufficient representation of the current view and the goal view for a navigation policy can be learned by predicting the location and size of a crop of the current view that corresponds to the goal. We further show that training such random crop prediction in a self-supervised fashion purely on synthetic noise images transfers well to natural home images. The learned representation can then be bootstrapped to learn a navigation policy efficiently with little interaction data. Code is available at https://github.com/yanweiw/noise2ptz

Cite

Text

Wang et al. "Visual Pre-Training for Navigation: What Can We Learn from Noise?." NeurIPS 2022 Workshops: SyntheticData4ML, 2022.

Markdown

[Wang et al. "Visual Pre-Training for Navigation: What Can We Learn from Noise?." NeurIPS 2022 Workshops: SyntheticData4ML, 2022.](https://mlanthology.org/neuripsw/2022/wang2022neuripsw-visual/)

BibTeX

@inproceedings{wang2022neuripsw-visual,
  title     = {{Visual Pre-Training for Navigation: What Can We Learn from Noise?}},
  author    = {Wang, Yanwei and Ko, Ching-Yun and Agrawal, Pulkit},
  booktitle = {NeurIPS 2022 Workshops: SyntheticData4ML},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/wang2022neuripsw-visual/}
}