KAIROS: Scalable Model-Agnostic Data Valuation

Zhu, Jiongli; Prashant, Parjanya Prajakta; Cloninger, Alex; Salimi, Babak

KAIROS: Scalable Model-Agnostic Data Valuation

Jiongli Zhu, Parjanya Prajakta Prashant, Alex Cloninger, Babak Salimi

NeurIPS 2025

/neurips/2025/zhu2025neurips-kairos/

Abstract

Data valuation techniques quantify each training example's contribution to model performance, providing a principled basis for data cleaning, acquisition, and selection. Existing valuation methods remain inadequate: \emph{model-based} techniques depend on a single fitted model and inherit its biases, while \emph{algorithm-based} approaches like Data Shapley scale poorly due to their need to train multiple models. Recent work has proposed model-agnostic alternatives based on Wasserstein distance between the training set and a clean reference set, but exact computation is expensive and approximations often misrank examples. We introduce KAIROS, a model-agnostic framework that values examples by their contribution to the Maximum Mean Discrepancy (MMD) between the training set and a clean reference distribution. Unlike Wasserstein methods, MMD admits a closed-form solution that requires no approximations and is scalable to large datasets. Additionally, KAIROS enables efficient online valuation: adding a new batch of $m$ examples requires only $O(mN)$ computation to update all scores, compared to $O(N^2)$ in prior work where $N$ is the training set size. Empirical evaluations on noise, mislabeling, and poisoning benchmarks show that KAIROS consistently outperforms state-of-the-art baselines in both accuracy and runtime. On ImageNet, KAIROS achieves up to 15 $\times$ speedup over the fastest baseline while maintaining superior data valuation quality. Our results demonstrate that model-agnostic methods can match or exceed model-based approaches in performance while scaling to large datasets.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Zhu et al. "KAIROS: Scalable Model-Agnostic Data Valuation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhu et al. "KAIROS: Scalable Model-Agnostic Data Valuation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhu2025neurips-kairos/)

BibTeX

@inproceedings{zhu2025neurips-kairos,
  title     = {{KAIROS: Scalable Model-Agnostic Data Valuation}},
  author    = {Zhu, Jiongli and Prashant, Parjanya Prajakta and Cloninger, Alex and Salimi, Babak},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhu2025neurips-kairos/}
}