Leveraging Multimodal Large Language Models for Joint Discrete and Continuous Evaluation in Text-to-Image Alignment

Cite

Text

Zhang et al. "Leveraging Multimodal Large Language Models for Joint Discrete and Continuous Evaluation in Text-to-Image Alignment." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Zhang et al. "Leveraging Multimodal Large Language Models for Joint Discrete and Continuous Evaluation in Text-to-Image Alignment." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/zhang2025cvprw-leveraging/)

BibTeX

@inproceedings{zhang2025cvprw-leveraging,
  title     = {{Leveraging Multimodal Large Language Models for Joint Discrete and Continuous Evaluation in Text-to-Image Alignment}},
  author    = {Zhang, Zhichao and Li, Xinyue and Sun, Wei and Zhang, Zicheng and Li, Yunhao and Liu, Xiaohong and Zhai, Guangtao},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {977-986},
  url       = {https://mlanthology.org/cvprw/2025/zhang2025cvprw-leveraging/}
}