WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-World Scenarios

Abstract

We introduce WearVQA, the first benchmark specifically designed to evaluate the visual question answering (VQA) capabilities of multi-modal AI assistant on wearable devices like smart glasses. Unlike prior benchmarks that focus on high-quality, third-person imagery, WearVQA reflects the unique chal- lenges of ego-centric interaction—where visual inputs may be occluded, poorly lit, unzoomed, or blurry, and questions are grounded in realistic wearable use cases. The benchmark comprises 2,500 carefully curated image-question-answer triplets, spanning 7 diverse image domains including both text-centric and general scenes, 10 cognitive task types ranging from basic recognition to various forms of reasoning, and 6 common wearables-specific image quality issues. All questions are designed to be answerable using only the visual input and common senses. WearVQA is paired with a rigorous LLM-as-a-judge evaluation framework with 96% labeling accuracy. Open-source and proprietary multi-modal LLMs achieved a QA accuracy as low as 24–52% on WearVQA, with substantial drops on lower-quality images and reasoning- heavy tasks. These observations position WearVQA as a comprehensive and challenging benchmark for guiding technicial advancement towards robust, real-world multi-modal wearables AI systems.

Cite

Text

Chang et al. "WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-World Scenarios." Advances in Neural Information Processing Systems, 2025.

Markdown

[Chang et al. "WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-World Scenarios." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/chang2025neurips-wearvqa/)

BibTeX

@inproceedings{chang2025neurips-wearvqa,
  title     = {{WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-World Scenarios}},
  author    = {Chang, Eun and Huang, Zhuangqun and Liao, Yiwei and Bhavsar, Sagar Ravi and Param, Amogh and Stark, Tammy and Ahmadyan, Adel and Yang, Xiao and Wang, Jiaqi and Abdullah, Ahsan and Nguyen, Giang and Iyer, Akil and Hall, David Patrick and Li, Elissa and Scheffer, Nicolas and Kirmani, Ahmed and Damavandi, Babak and Wanga, Rakesh and Kumar, Anuj and Patel, Rohit and Moon, Seungwhan and Dong, Xin Luna},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/chang2025neurips-wearvqa/}
}