MixEval-X: Any-to-Any Evaluations from Real-World Data Mixture

Abstract

Perceiving and generating diverse modalities are crucial for AI models to effectively learn from and engage with real-world signals, necessitating reliable evaluations for their development. We identify two major issues in current evaluations: (1) inconsistent standards, shaped by different communities with varying protocols and maturity levels; and (2) significant query, grading, and generalization biases. To address these, we introduce MixEval-X, the first any-to-any, real-world benchmark designed to optimize and standardize evaluations across diverse input and output modalities. We propose multi-modal benchmark mixture and adaptation-rectification pipelines to reconstruct real-world task distributions, ensuring evaluations generalize effectively to real-world use cases. Extensive meta-evaluations show our approach effectively aligns benchmark samples with real-world task distributions. Meanwhile, MixEval-X's model rankings correlate strongly with that of crowd-sourced real-world evaluations (up to 0.98) while being much more efficient. We provide comprehensive leaderboards to rerank existing models and organizations and offer insights to enhance understanding of multi-modal evaluations and inform future research.

Cite

Text

Ni et al. "MixEval-X: Any-to-Any Evaluations from Real-World Data Mixture." International Conference on Learning Representations, 2025.

Markdown

[Ni et al. "MixEval-X: Any-to-Any Evaluations from Real-World Data Mixture." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/ni2025iclr-mixevalx/)

BibTeX

@inproceedings{ni2025iclr-mixevalx,
  title     = {{MixEval-X: Any-to-Any Evaluations from Real-World Data Mixture}},
  author    = {Ni, Jinjie and Song, Yifan and Ghosal, Deepanway and Li, Bo and Zhang, David Junhao and Yue, Xiang and Xue, Fuzhao and Deng, Yuntian and Zheng, Zian and Zhang, Kaichen and Shah, Mahir and Jain, Kabir and You, Yang and Shieh, Michael},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/ni2025iclr-mixevalx/}
}