Regression-Stratified Sampling for Optimized Algorithm Selection in Time-Constrained Tabular AutoML
Abstract
The selection of a machine-learning (ML) algorithm is indispensable for tabular AutoML training. Finding an optimized algorithm from a search space can be expensive for large tabular datasets, especially under time constraints. In this study, we introduce a novel Regression-Stratified Sampling approach that optimizes algorithm selection by minimizing distribution distance between a subset of data and the target variable(s) in the full-scale dataset via Probability Density Function (PDF). Additionally, we introduce a PDF Energy metric, based on relative entropy, to identify an optimized ML algorithm from the search space. Our comprehensive evaluation results demonstrate that the proposed approach successfully selects optimized algorithms from a search space of atomic and ensemble models, outperforming simple random sampling methods. We also conduct a thorough evaluation against Kullback-Leibler (KL) divergence, where the PDF Energy metric proves superior in algorithm selection. Furthermore, we validate our approach for ML algorithm selection in an end-to-end scenario across 31 public datasets using 6 tabular AutoML tools. The empirical results indicate that our proposed method efficiently utilizes Regression-Stratified Sampling and reliably identifies an optimized machine learning algorithm for tabular data through the PDF Energy metric under time constraints.
Cite
Text
Bahrami et al. "Regression-Stratified Sampling for Optimized Algorithm Selection in Time-Constrained Tabular AutoML." ICML 2024 Workshops: SPIGM, 2024.Markdown
[Bahrami et al. "Regression-Stratified Sampling for Optimized Algorithm Selection in Time-Constrained Tabular AutoML." ICML 2024 Workshops: SPIGM, 2024.](https://mlanthology.org/icmlw/2024/bahrami2024icmlw-regressionstratified/)BibTeX
@inproceedings{bahrami2024icmlw-regressionstratified,
title = {{Regression-Stratified Sampling for Optimized Algorithm Selection in Time-Constrained Tabular AutoML}},
author = {Bahrami, Mehdi and Hasegawa, So and Liu, Lei and Chen, Wei-Peng},
booktitle = {ICML 2024 Workshops: SPIGM},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/bahrami2024icmlw-regressionstratified/}
}