AutoEval Done Right: Using Synthetic Data for Model Evaluation

Abstract

The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased.

Cite

Text

Boyeau et al. "AutoEval Done Right: Using Synthetic Data for Model Evaluation." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Boyeau et al. "AutoEval Done Right: Using Synthetic Data for Model Evaluation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/boyeau2025icml-autoeval/)

BibTeX

@inproceedings{boyeau2025icml-autoeval,
  title     = {{AutoEval Done Right: Using Synthetic Data for Model Evaluation}},
  author    = {Boyeau, Pierre and Angelopoulos, Anastasios Nikolas and Li, Tianle and Yosef, Nir and Malik, Jitendra and Jordan, Michael I.},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {5276-5290},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/boyeau2025icml-autoeval/}
}