AutoEval Done Right: Using Synthetic Data for Model Evaluation
Abstract
The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased.
Cite
Text
Boyeau et al. "AutoEval Done Right: Using Synthetic Data for Model Evaluation." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Boyeau et al. "AutoEval Done Right: Using Synthetic Data for Model Evaluation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/boyeau2025icml-autoeval/)BibTeX
@inproceedings{boyeau2025icml-autoeval,
title = {{AutoEval Done Right: Using Synthetic Data for Model Evaluation}},
author = {Boyeau, Pierre and Angelopoulos, Anastasios Nikolas and Li, Tianle and Yosef, Nir and Malik, Jitendra and Jordan, Michael I.},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {5276-5290},
volume = {267},
url = {https://mlanthology.org/icml/2025/boyeau2025icml-autoeval/}
}