Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects

Abstract

Unlabeled 3D objects present an opportunity to leverage pretrained vision language models (VLMs) on a range of annotation tasks---from describing object semantics to physical properties. An accurate response must take into account the full appearance of the object in 3D, various ways of phrasing the question/prompt, and changes in other factors that affect the response. We present a method, to marginalize over arbitrary factors varied across VLM queries, which relies on the VLM’s scores for sampled responses. We first show that this aggregation method can outperform a language model (e.g., GPT4) for summarization, for instance avoiding hallucinations when there are contrasting details between responses. Secondly, we show that aggregated annotations are useful for prompt-chaining; they help improve downstream VLM predictions (e.g., of object material when the object’s type is specified as an auxiliary input in the prompt). Such auxiliary inputs allow ablating and measuring the contribution of visual reasoning over language-only reasoning. Using these evaluations, we show that VLMs approach the quality of human-verified annotations on both type and material inference on the large-scale Objaverse dataset.

Cite

Text

Kabra et al. "Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects." NeurIPS 2023 Workshops: SyntheticData4ML, 2023.

Markdown

[Kabra et al. "Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects." NeurIPS 2023 Workshops: SyntheticData4ML, 2023.](https://mlanthology.org/neuripsw/2023/kabra2023neuripsw-evaluating/)

BibTeX

@inproceedings{kabra2023neuripsw-evaluating,
  title     = {{Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects}},
  author    = {Kabra, Rishabh and Matthey, Loic and Lerchner, Alexander and Mitra, Niloy},
  booktitle = {NeurIPS 2023 Workshops: SyntheticData4ML},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/kabra2023neuripsw-evaluating/}
}