An Expanded Benchmark That Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets

Abstract

Active Learning (AL) addresses the crucial challenge of enabling machines to efficiently gather labeled examples through strategic queries. Among the many AL strategies, Uncertainty Sampling (US) stands out as one of the most widely adopted. US queries the example(s) that the current model finds uncertain, proving to be both straightforward and effective. Despite claims in the literature suggesting superior alternatives to US, community-wide acceptance remains elusive. In fact, existing benchmarks for tabular datasets present conflicting conclusions on the continued competitiveness of US. In this study, we review the literature on AL strategies in the last decade and build the most comprehensive open-source AL benchmark to date to understand the relative merits of different AL strategies. The benchmark surpasses existing ones by encompassing a broader coverage of strategies, models, and data. Through our investigation of the conflicting conclusions in existing tabular AL benchmarks by evaluation under broad AL experimental settings, we uncover fresh insights into the often-overlooked issue of using machine learning models--**model compatibility** in the context of US. Specifically, we notice that adopting the different models for the querying unlabeled examples and learning tasks would degrade US's effectiveness. Notably, our findings affirm that US maintains a competitive edge over other strategies when paired with compatible models. These findings have practical implications and provide a concrete recipe for AL practitioners, empowering them to make informed decisions when working with tabular classifications with limited labeled data. The code for this project is available on https://github.com/ariapoy/active-learning-benchmark.

Cite

Text

Lu et al. "An Expanded Benchmark That Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets." Transactions on Machine Learning Research, 2025.

Markdown

[Lu et al. "An Expanded Benchmark That Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/lu2025tmlr-expanded/)

BibTeX

@article{lu2025tmlr-expanded,
  title     = {{An Expanded Benchmark That Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets}},
  author    = {Lu, Po-Yi and Cheng, Yi-Jie and Li, Chun-Liang and Lin, Hsuan-Tien},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/lu2025tmlr-expanded/}
}