When to Retrain a Machine Learning Model

Abstract

A significant challenge in maintaining real-world machine learning models is responding to the continuous and unpredictable evolution of data. Most practitioners are faced with the difficult question: when should I retrain or update my machine learning model? This seemingly straightforward problem is particularly challenging for three reasons: 1) decisions must be made based on very limited information - we usually have access to only a few examples, 2) the nature, extent, and impact of the distribution shift are unknown, and 3) it involves specifying a cost ratio between retraining and poor performance, which can be hard to characterize. Existing works address certain aspects of this problem, but none offer a comprehensive solution. Distribution shift detection falls short as it cannot account for the cost trade-off; the scarcity of the data, paired with its unusual structure, makes it a poor fit for existing offline reinforcement learning methods, and the online learning formulation overlooks key practical considerations. To address this, we present a principled formulation of the retraining problem and propose an uncertainty-based method that makes decisions by continually forecasting the evolution of model performance evaluated with a bounded metric. Our experiments, addressing classification tasks, show that the method consistently outperforms existing baselines on 7 datasets. We thoroughly assess its robustness to varying cost trade-off values and mis-specified cost trade-offs.

Cite

Text

Regol et al. "When to Retrain a Machine Learning Model." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Regol et al. "When to Retrain a Machine Learning Model." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/regol2025icml-retrain/)

BibTeX

@inproceedings{regol2025icml-retrain,
  title     = {{When to Retrain a Machine Learning Model}},
  author    = {Regol, Florence and Schwinn, Leo and Sprague, Kyle and Coates, Mark and Markovich, Thomas},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {51369-51404},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/regol2025icml-retrain/}
}