AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators

Ryan, Michael J; Zhang, Yanzhe; Salunkhe, Amol; Chu, Yi; Xu, Di; Yang, Diyi

AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators

Michael J Ryan, Yanzhe Zhang, Amol Salunkhe, Yi Chu, Di Xu, Diyi Yang

ICLR 2026

/iclr/2026/ryan2026iclr-autometrics/

Abstract

Evaluating user-facing AI applications remains a central challenge, especially in open-ended domains such as travel planning, clinical note generation, or dialogue. The gold standard is user feedback (e.g., thumbs up/down) or behavioral signals (e.g., retention), but these are often scarce in prototypes and research projects, or too-slow to use for system optimization. We present **AutoMetrics**, a framework for synthesizing evaluation metrics under low-data constraints. AutoMetrics combines retrieval from **MetricBank**, a collection of 48 metrics we curate, with automatically generated LLM-as-a-Judge criteria informed by lightweight human feedback. These metrics are composed via regression to maximize correlation with human signal. AutoMetrics takes you from expensive measures to interpretable automatic metrics. Across 5 diverse tasks, AutoMetrics improves Kendall correlation with human ratings by up to 33.4% over LLM-as-a-Judge while requiring fewer than 100 feedback points. We show that AutoMetrics can be used as a proxy reward to equal effect as a verifiable reward. We release the full AutoMetrics toolkit and MetricBank to accelerate adaptive evaluation of LLM applications.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Ryan et al. "AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators." International Conference on Learning Representations, 2026.

Markdown

[Ryan et al. "AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/ryan2026iclr-autometrics/)

BibTeX

@inproceedings{ryan2026iclr-autometrics,
  title     = {{AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators}},
  author    = {Ryan, Michael J and Zhang, Yanzhe and Salunkhe, Amol and Chu, Yi and Xu, Di and Yang, Diyi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/ryan2026iclr-autometrics/}
}