Efficient Lifelong Model Evaluation in an Era of Rapid Progress

Abstract

Standardized benchmarks drive progress in machine learning. However, with repeated testing, the risk of overfitting grows as algorithms over-exploit benchmark idiosyncrasies. In our work, we seek to mitigate this challenge by compiling \textit{ever-expanding} large-scale benchmarks called \textit{Lifelong Benchmarks}. As exemplars of our approach, we create \textit{Lifelong-CIFAR10} and \textit{Lifelong-ImageNet}, containing (for now) 1.69M and 1.98M test samples, respectively. While reducing overfitting, lifelong benchmarks introduce a key challenge: the high cost of evaluating a growing number of models across an ever-expanding sample set. To address this challenge, we also introduce an efficient evaluation framework: \textit{Sort \& Search (S\&S)}, which reuses previously evaluated models by leveraging dynamic programming algorithms to selectively rank and sub-select test samples, enabling cost-effective lifelong benchmarking. Extensive empirical evaluations across $\sim$31,000 models demonstrate that \textit{S\&S} achieves highly-efficient approximate accuracy measurement, reducing compute cost from 180 GPU days to 5 GPU hours ($\sim$1000x reduction) on a single A100 GPU, with low approximation error. As such, lifelong benchmarks offer a robust, practical solution to the ``benchmark exhaustion'' problem.

Cite

Text

Prabhu et al. "Efficient Lifelong Model Evaluation in an Era of Rapid Progress." Neural Information Processing Systems, 2024. doi:10.52202/079017-2357

Markdown

[Prabhu et al. "Efficient Lifelong Model Evaluation in an Era of Rapid Progress." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/prabhu2024neurips-efficient/) doi:10.52202/079017-2357

BibTeX

@inproceedings{prabhu2024neurips-efficient,
  title     = {{Efficient Lifelong Model Evaluation in an Era of Rapid Progress}},
  author    = {Prabhu, Ameya and Udandarao, Vishaal and Torr, Philip H.S. and Bethge, Matthias and Bibi, Adel and Albanie, Samuel},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2357},
  url       = {https://mlanthology.org/neurips/2024/prabhu2024neurips-efficient/}
}