Where Are We in the Search for an Artificial Visual Cortex for Embodied Intelligence?

Abstract

We present the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI). First, we curate CORTEXBENCH, consisting of 17 different EAI tasks spanning locomotion, navigation, dexterous and mobile manipulation. Next, we systematically evaluate existing visual foundation models and find that none is universally dominant. To study the effect of pre-training data scale and diversity, we combine ImageNet with over 4,000 hours of egocentric videos from 7 different sources (over 5.6M images) and train different sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. These models required over 10,000 GPU-hours to train and will be open-sourced to the community. We find that scaling dataset size and diversity does not improve performance across all tasks but does so on average. Finally, we show that adding a second pre-training step on a small in-domain dataset improves performance, matching or outperforming the best known results in this setting.

Cite

Text

Majumdar et al. "Where Are We in the Search for an Artificial Visual Cortex for Embodied Intelligence?." ICLR 2023 Workshops: RRL, 2023.

Markdown

[Majumdar et al. "Where Are We in the Search for an Artificial Visual Cortex for Embodied Intelligence?." ICLR 2023 Workshops: RRL, 2023.](https://mlanthology.org/iclrw/2023/majumdar2023iclrw-we/)

BibTeX

@inproceedings{majumdar2023iclrw-we,
  title     = {{Where Are We in the Search for an Artificial Visual Cortex for Embodied Intelligence?}},
  author    = {Majumdar, Arjun and Yadav, Karmesh and Arnaud, Sergio and Ma, Yecheng Jason and Chen, Claire and Silwal, Sneha and Jain, Aryan and Berges, Vincent-Pierre and Abbeel, Pieter and Batra, Dhruv and Lin, Yixin and Maksymets, Oleksandr and Rajeswaran, Aravind and Meier, Franziska},
  booktitle = {ICLR 2023 Workshops: RRL},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/majumdar2023iclrw-we/}
}