InViG: Benchmarking Open-Ended Interactive Visual Grounding with 500k Dialogues
Abstract
Ambiguity is ubiquitous in human communication. Previous approaches in Human-Robot Interaction (HRI) have often relied on predefined interaction templates, leading to reduced performance in realistic and open-ended scenarios. To address these issues, we present a large-scale dataset, InViG, for interactive visual grounding under language ambiguity. Our dataset comprises over 520K images accompanied by open-ended goal-oriented disambiguation dialogues, encompassing millions of object instances and corresponding question-answer pairs. Leveraging the InViG dataset, we conduct extensive studies and propose a set of baseline solutions for end-to-end interactive visual disambiguation and grounding, achieving a 45.6% success rate during validation. To the best of our knowledge, the InViG dataset is the first large-scale dataset for resolving open-ended interactive visual grounding, presenting a practical yet highly challenging benchmark for ambiguity-aware HRI. Codes and datasets are available at: https://openivg.github.io.
Cite
Text
Zhang et al. "InViG: Benchmarking Open-Ended Interactive Visual Grounding with 500k Dialogues." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00560Markdown
[Zhang et al. "InViG: Benchmarking Open-Ended Interactive Visual Grounding with 500k Dialogues." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/zhang2024cvprw-invig/) doi:10.1109/CVPRW63382.2024.00560BibTeX
@inproceedings{zhang2024cvprw-invig,
title = {{InViG: Benchmarking Open-Ended Interactive Visual Grounding with 500k Dialogues}},
author = {Zhang, Hanbo and Xu, Jie and Mo, Yuchen and Kong, Tao},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {5508-5518},
doi = {10.1109/CVPRW63382.2024.00560},
url = {https://mlanthology.org/cvprw/2024/zhang2024cvprw-invig/}
}