Looking at Words and Points with Attention: A Benchmark for Text-to-Shape Coherence

Abstract

While text-conditional 3D object generation and manipulation have seen rapid progress, the evaluation of coherence between generated 3D shapes and input textual descriptions lacks a clear benchmark. The reason is twofold: a) the low quality of the textual descriptions in the only publicly available dataset of text-shape pairs; b) the limited effectiveness of the metrics used to quantitatively assess such coherence. In this paper, we propose a comprehensive solution that addresses both weaknesses. Firstly, we employ large language models to automatically refine textual descriptions associated with shapes. Secondly, we propose a quantitative metric to assess text-to-shape coherence, through cross-attention mechanisms. To validate our approach, we conduct a user study and compare quantitatively our metric with existing ones. The refined dataset, the new metric and a set of text-shape pairs validated by the user study comprise a novel, fine-grained benchmark that we publicly release to foster research on text-to-shape coherence of text-conditioned 3D generative models. Benchmark available at https://cvlab-unibo.github.io/CrossCoherence-Web/.

Cite

Text

Amaduzzi et al. "Looking at Words and Points with Attention: A Benchmark for Text-to-Shape Coherence." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00309

Markdown

[Amaduzzi et al. "Looking at Words and Points with Attention: A Benchmark for Text-to-Shape Coherence." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/amaduzzi2023iccvw-looking/) doi:10.1109/ICCVW60793.2023.00309

BibTeX

@inproceedings{amaduzzi2023iccvw-looking,
  title     = {{Looking at Words and Points with Attention: A Benchmark for Text-to-Shape Coherence}},
  author    = {Amaduzzi, Andrea and Lisanti, Giuseppe and Salti, Samuele and Di Stefano, Luigi},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {2860-2869},
  doi       = {10.1109/ICCVW60793.2023.00309},
  url       = {https://mlanthology.org/iccvw/2023/amaduzzi2023iccvw-looking/}
}