Data-Efficient Large Vision Models Through Sequential Autoregression

Abstract

Training general-purpose vision models on purely sequential visual data, eschewing linguistic inputs, has heralded a new frontier in visual understanding. These models are intended to not only comprehend but also seamlessly transit to out-of-domain tasks. However, current endeavors are hamstrung by an over-reliance on colossal models, exemplified by models with upwards of 3B parameters, and the necessity for an extensive corpus of visual data, often comprising a staggering 400B tokens. In this paper, we delve into the development of an efficient, autoregression-based vision model, innovatively architected to operate on a limited dataset. We meticulously demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding during the testing phase. Our empirical evaluations underscore the model’s agility in adapting to various tasks, heralding a significant reduction in the parameter footprint, and a marked decrease in training data requirements, thereby paving the way for more sustainable and accessible advancements in the field of generalist vision models. The code is available at https://github.com/ggjy/DeLVM.

Cite

Text

Hao et al. "Data-Efficient Large Vision Models Through Sequential Autoregression." International Conference on Machine Learning, 2024.

Markdown

[Hao et al. "Data-Efficient Large Vision Models Through Sequential Autoregression." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/hao2024icml-dataefficient/)

BibTeX

@inproceedings{hao2024icml-dataefficient,
  title     = {{Data-Efficient Large Vision Models Through Sequential Autoregression}},
  author    = {Hao, Zhiwei and Guo, Jianyuan and Wang, Chengcheng and Tang, Yehui and Wu, Han and Hu, Han and Han, Kai and Xu, Chang},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {17572-17596},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/hao2024icml-dataefficient/}
}