IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs

Faraz, Ali; Akash,; Khan, Shaharukh; Kolla, Raja; Patidar, Akshat; Goswami, Suranjan; Ravi, Abhinav; Khatri, Chandra; Agarwal, Shubham

IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs

Ali Faraz, Akash, Shaharukh Khan, Raja Kolla, Akshat Patidar, Suranjan Goswami, Abhinav Ravi, Chandra Khatri, Shubham Agarwal

ICLR 2026

/iclr/2026/faraz2026iclr-indicvisionbench/

Abstract

Vision-language models (VLMs) have demonstrated impressive generalization across multimodal tasks, yet most evaluation benchmarks remain Western-centric, leaving open questions about their performance in culturally diverse and multilingual settings. To address this gap, we introduce IndicVisionBench, the first large-scale benchmark centered on the Indian subcontinent. Covering English and 10 Indian languages, our benchmark spans 3 multimodal tasks, including Optical Character Recognition (OCR), Multimodal Machine Translation (MMT), and Visual Question Answering (VQA), covering 6 kinds of question types. Our final benchmark consists of a total of ~5K images and 37K+ QA pairs across 13 culturally grounded topics. In addition, we release a paired parallel corpus of annotations across 10 Indic languages, creating a unique resource for analyzing cultural and linguistic biases in VLMs. We evaluate a broad spectrum of 8 models, from proprietary closed-source systems to open-weights medium and large-scale models. Our experiments reveal substantial performance gaps, underscoring the limitations of current VLMs in culturally diverse contexts. By centering cultural diversity and multilinguality, IndicVisionBench establishes a reproducible evaluation framework that paves the way for more inclusive multimodal research. Our benchmark is publicly available at https://huggingface.co/datasets/krutrim-ai-labs/IndicVisionBench.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Faraz et al. "IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs." International Conference on Learning Representations, 2026.

Markdown

[Faraz et al. "IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/faraz2026iclr-indicvisionbench/)

BibTeX

@inproceedings{faraz2026iclr-indicvisionbench,
  title     = {{IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs}},
  author    = {Faraz, Ali and Akash,  and Khan, Shaharukh and Kolla, Raja and Patidar, Akshat and Goswami, Suranjan and Ravi, Abhinav and Khatri, Chandra and Agarwal, Shubham},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/faraz2026iclr-indicvisionbench/}
}