ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla

Abstract

Visual Question Answer (VQA) poses the problem of answering a natural language question about a visual context. Bangla, despite being a widely spoken language, is considered low-resource in the realm of VQA due to the lack of proper benchmarks, challenging models known to be performant in other languages. Furthermore, existing Bangla VQA datasets offer little regional relevance and are largely adapted from their foreign counterparts. To address these challenges, we introduce a large-scale Bangla VQA dataset, ChitroJera, totaling over 15k samples from diverse and locally relevant data sources. We assess the performance of text encoders, image encoders, multimodal models, and our novel dual-encoder models. The experiments reveal that the pre-trained dual-encoders outperform other models of their scale. We also evaluate the performance of current large vision language models (LVLMs) using prompt-based techniques, achieving the overall best performance. Given the underdeveloped state of existing datasets, we envision ChitroJera expanding the scope of Vision-Language tasks in Bangla. Our code and data are available at: http://github.com/farhanishmam/ChitroJera .

Cite

Text

Barua et al. "ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06078-5_27

Markdown

[Barua et al. "ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/barua2025ecmlpkdd-chitrojera/) doi:10.1007/978-3-032-06078-5_27

BibTeX

@inproceedings{barua2025ecmlpkdd-chitrojera,
  title     = {{ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla}},
  author    = {Barua, Deeparghya Dutta and Sourove, Md Sakib Ul Rahman and Fahim, Md and Haider, Fabiha and Shifat, Fariha Tanjim and Adib, Md Tasmim Rahman and Uddin, Anam Borhan and Ishmam, Md Farhan and Alam, Md. Farhad},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {473-491},
  doi       = {10.1007/978-3-032-06078-5_27},
  url       = {https://mlanthology.org/ecmlpkdd/2025/barua2025ecmlpkdd-chitrojera/}
}