VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

Abstract

Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of $12$ state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of $469K$ question-answer pairs involving $30K$ images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs, shedding new light on the capabilities of current SOTA VLMs in answering biologically relevant questions using images.

Cite

Text

Maruf et al. "VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images." Neural Information Processing Systems, 2024. doi:10.52202/079017-4165

Markdown

[Maruf et al. "VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/maruf2024neurips-vlm4bio/) doi:10.52202/079017-4165

BibTeX

@inproceedings{maruf2024neurips-vlm4bio,
  title     = {{VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images}},
  author    = {Maruf, M. and Daw, Arka and Mehrab, Kazi Sajeed and Manogaran, Harish Babu and Neog, Abhilash and Sawhney, Medha and Khurana, Mridul and Balhoff, James P. and Bakış, Yasin and Altintas, Bahadir and Thompson, Matthew J and Campolongo, Elizabeth G and Uyeda, Josef C. and Lapp, Hilmar and Jr., Henry L. Bart and Mabee, Paula M. and Su, Yu and Chao, Wei-Lun and Stewart, Charles and Berger-Wolf, Tanya and Dahdul, Wasila and Karpatne, Anuj},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-4165},
  url       = {https://mlanthology.org/neurips/2024/maruf2024neurips-vlm4bio/}
}