Holistic Evaluation of Text-to-Image Models

Abstract

The stunning qualitative improvement of text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on image-text alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at https://crfm.stanford.edu/heim/latest and the code at https://github.com/stanford-crfm/helm, which is integrated with the HELM codebase

Cite

Text

Lee et al. "Holistic Evaluation of Text-to-Image Models." Neural Information Processing Systems, 2023.

Markdown

[Lee et al. "Holistic Evaluation of Text-to-Image Models." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/lee2023neurips-holistic/)

BibTeX

@inproceedings{lee2023neurips-holistic,
  title     = {{Holistic Evaluation of Text-to-Image Models}},
  author    = {Lee, Tony and Yasunaga, Michihiro and Meng, Chenlin and Mai, Yifan and Park, Joon Sung and Gupta, Agrim and Zhang, Yunzhi and Narayanan, Deepak and Teufel, Hannah and Bellagente, Marco and Kang, Minguk and Park, Taesung and Leskovec, Jure and Zhu, Jun-Yan and Li, Fei-Fei and Wu, Jiajun and Ermon, Stefano and Liang, Percy},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/lee2023neurips-holistic/}
}