Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
Abstract
Rapid advancements in text-to-3D generation require robust and scalable evaluation metrics that align closely with human judgment, a need unmet by current metrics such as PSNR and CLIP, which require ground-truth data or focus only on prompt fidelity. To address this, we introduce Gen3DEval, a novel evaluation framework that leverages vision large language models (vLLMs) specifically fine-tuned for 3D object quality assessment. Gen3DEval evaluates text fidelity, appearance, and surface quality by analyzing 3D surface normals, without requiring ground-truth comparisons, bridging the gap between automated metrics and user preferences. Compared to state-of-the-art task-agnostic models, Gen3DEval demonstrates superior performance in user-aligned evaluations, placing it as a comprehensive and accessible benchmark for future research on text-to-3D generation. The project page can be found here: https://shalini-maiti.github.io/gen3deval.github.io/.
Cite
Text
Maiti et al. "Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01729Markdown
[Maiti et al. "Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/maiti2025cvpr-gen3deval/) doi:10.1109/CVPR52734.2025.01729BibTeX
@inproceedings{maiti2025cvpr-gen3deval,
title = {{Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects}},
author = {Maiti, Shalini and Agapito, Lourdes and Kokkinos, Filippos},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {18552-18562},
doi = {10.1109/CVPR52734.2025.01729},
url = {https://mlanthology.org/cvpr/2025/maiti2025cvpr-gen3deval/}
}