OmniInput: An Evaluation Framework for Deep Learning Models on Internet-Scale Data

Abstract

Evaluating machine learning models is important yet challenging in many real-world scenarios. Traditional analysis is dataset-driven, that is, models are evaluated on predefined benchmark datasets. However, these datasets can only cover a limited scope, leaving unanticipated inputs untested and weaknesses of the model unrevealed. To overcome this problem, we propose OmniInput, a novel approach to evaluate models comprehensively using an input space (i.e. internet-scale data). Our method entails efficient sampling of the inputs from the model and estimation of its corresponding output distribution, and an innovative way to calculate the model’s precision and recall curve from the output distribution with only modest human annotation effort. In our experiments, we first validate the correctness of OmniInput within a small input space where brute-force enumeration is still possible. We then show that OmniInput can quantitatively evaluate more complex models such as language models (various versions of GPT2, OLMo, and DistilBERT) and computer vision models, and analyze interesting patterns in an input space.

Cite

Text

Liu et al. "OmniInput: An Evaluation Framework for Deep Learning Models on Internet-Scale Data." Transactions on Machine Learning Research, 2025.

Markdown

[Liu et al. "OmniInput: An Evaluation Framework for Deep Learning Models on Internet-Scale Data." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/liu2025tmlr-omniinput/)

BibTeX

@article{liu2025tmlr-omniinput,
  title     = {{OmniInput: An Evaluation Framework for Deep Learning Models on Internet-Scale Data}},
  author    = {Liu, Weitang and Li, Yuelei and Li, Ying Wai and Wang, Zihan and You, Yi-Zhuang and Shang, Jingbo},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/liu2025tmlr-omniinput/}
}