Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications
Abstract
Many industry verticals are confronted with small-sized tabular data. In this low-data regime, it is currently unclear whether the best performance can be expected from simple baselines, or more complex machine learning approaches that leverage meta-learning and ensembling. On 44 tabular classification datasets with sample sizes $\leq$ 500, we find that L2-regularized logistic regression performs similar to state-of-the-art automated machine learning (AutoML) frameworks (AutoPrognosis, AutoGluon) and off-the-shelf deep neural networks (TabPFN, HyperFast) on the majority of the benchmark datasets. We therefore recommend to consider logistic regression as the first choice for data-scarce applications with tabular data and provide practitioners with best practices for further method selection.
Cite
Text
Knauer and Rodner. "Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications." ICLR 2024 Workshops: PML4LRS, 2024.Markdown
[Knauer and Rodner. "Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications." ICLR 2024 Workshops: PML4LRS, 2024.](https://mlanthology.org/iclrw/2024/knauer2024iclrw-squeezing/)BibTeX
@inproceedings{knauer2024iclrw-squeezing,
title = {{Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications}},
author = {Knauer, Ricardo and Rodner, Erik},
booktitle = {ICLR 2024 Workshops: PML4LRS},
year = {2024},
url = {https://mlanthology.org/iclrw/2024/knauer2024iclrw-squeezing/}
}