Toward Data-Driven Skill Identification for General-Purpose Vision-Language Models

Abstract

The evolution of vision-language (VL) models towards broad competencies has complicated benchmarking, necessitating diverse tasks for accurate evaluation. Moving beyond intuition-guided task selection common in existing benchmarks, we propose a data-driven approach that leverages transfer performance and Factor Analysis (FA) to identify latent skills crucial for VL tasks. Our study demonstrates the utility of FA in systematically understanding and evaluating VL models.

Cite

Text

Tiong et al. "Toward Data-Driven Skill Identification for General-Purpose Vision-Language Models." ICLR 2024 Workshops: DPFM, 2024.

Markdown

[Tiong et al. "Toward Data-Driven Skill Identification for General-Purpose Vision-Language Models." ICLR 2024 Workshops: DPFM, 2024.](https://mlanthology.org/iclrw/2024/tiong2024iclrw-datadriven/)

BibTeX

@inproceedings{tiong2024iclrw-datadriven,
  title     = {{Toward Data-Driven Skill Identification for General-Purpose Vision-Language Models}},
  author    = {Tiong, Anthony and Zhao, Junqi and Li, Junnan and Hoi, Steven and Xiong, Caiming and Li, Boyang},
  booktitle = {ICLR 2024 Workshops: DPFM},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/tiong2024iclrw-datadriven/}
}