Training and Cross-Validating Machine Learning Pipelines with Limited Memory

Martin Hirzel, Kiran Kate, Louis Mandel, Avraham Shinnar

AutoML 2024 pp. 13/1-25

/automl/2024/hirzel2024automl-training/

Abstract

While automated machine learning (AutoML) can save human labor in finding well-performing pipelines, it often suffers from two problems: overfitting and using excessive resources. Unfortunately, the solutions are often at odds: cross-validation helps reduce overfitting at the expense of more resources; conversely, preprocessing on a separate compute cluster and then cross-validating only the final predictor saves resources at the expense of more overfitting. This paper shows how to train and cross-validate entire pipelines on a single moderate machine with limited memory by using monoids, which are associative, thus providing a flexible way for handling large data one batch at a time. To facilitate AutoML, our approach is designed to support the common sklearn APIs used by many AutoML systems for pipelines, training, cross-validation, and several operators. Abstracted behind those APIs, our approach uses task graphs to extend the benefits of monoids from operators to pipelines, and provides a multi-backend implementation. Overall, our approach lets users train and cross-validate pipelines on simple and inexpensive compute infrastructure.

PDF AutoML OpenReview Semantic Scholar

Cite

Text

Hirzel et al. "Training and Cross-Validating Machine Learning Pipelines with Limited Memory." Proceedings of the Third International Conference on Automated Machine Learning, 2024.

Markdown

[Hirzel et al. "Training and Cross-Validating Machine Learning Pipelines with Limited Memory." Proceedings of the Third International Conference on Automated Machine Learning, 2024.](https://mlanthology.org/automl/2024/hirzel2024automl-training/)

BibTeX

@inproceedings{hirzel2024automl-training,
  title     = {{Training and Cross-Validating Machine Learning Pipelines with Limited Memory}},
  author    = {Hirzel, Martin and Kate, Kiran and Mandel, Louis and Shinnar, Avraham},
  booktitle = {Proceedings of the Third International Conference on Automated Machine Learning},
  year      = {2024},
  pages     = {13/1-25},
  volume    = {256},
  url       = {https://mlanthology.org/automl/2024/hirzel2024automl-training/}
}