Exploring One Million Machine Learning Pipelines: A Benchmarking Study

Abstract

Machine learning solutions are largely affected by the values of the hyperparameters of their algorithms. This has motivated a large number of recent research projects on hyperparameter tuning, with the proposal of several, and highly diverse, tuning approaches. Rather than proposing a new approach or identifying the most effective hyperparameter tuning approach, this paper looks for good machine learning solutions by exploring machine learning pipelines. For such, it benchmarks pipelines focusing on the interaction between feature preprocessing techniques and classification models. The study evaluates the effectiveness of pipeline combinations, identifying high-performing and underperforming combinations. Additionally, it provides meta-knowledge datasets without any optimization selection bias to foster research contributions in meta-learning, accelerating the development of meta-models. The findings provide insights into the most effective preprocessing and modeling combination, guiding practitioners and researchers in their selection processes.

Cite

Text

Alcobaça and De Carvalho. "Exploring One Million Machine Learning Pipelines: A Benchmarking Study." Proceedings of the Fourth International Conference on Automated Machine Learning, 2025.

Markdown

[Alcobaça and De Carvalho. "Exploring One Million Machine Learning Pipelines: A Benchmarking Study." Proceedings of the Fourth International Conference on Automated Machine Learning, 2025.](https://mlanthology.org/automl/2025/alcobaca2025automl-exploring/)

BibTeX

@inproceedings{alcobaca2025automl-exploring,
  title     = {{Exploring One Million Machine Learning Pipelines: A Benchmarking Study}},
  author    = {Alcobaça, Edesio and De Carvalho, Andre Carlos Ponce de Leon Ferreira},
  booktitle = {Proceedings of the Fourth International Conference on Automated Machine Learning},
  year      = {2025},
  pages     = {22/1-34},
  volume    = {293},
  url       = {https://mlanthology.org/automl/2025/alcobaca2025automl-exploring/}
}