Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples
Abstract
Neural Network-based active learning (NAL) is a cost-effective data selection technique that utilizes neural networks to select and train on a small subset of samples. While existing work successfully develops various effective or theory-justified NAL algorithms, the understanding of the two commonly used query criteria of NAL: uncertainty-based and diversity-based, remains in its infancy. In this work, we try to move one step forward by offering a unified explanation for the success of both query criteria-based NAL from a feature learning view. Specifically, we consider a feature-noise data model comprising easy-to-learn or hard-to-learn features disrupted by noise, and conduct analysis over 2-layer NN-based NALs in the pool-based scenario. We provably show that both uncertainty-based and diversity-based NAL are inherently amenable to one and the same principle, i.e., striving to prioritize samples that contain yet-to-be-learned features. We further prove that this shared principle is the key to their success-achieve small test error within a small labeled set. Contrastingly, the strategy-free passive learning exhibits a large test error due to the inadequate learning of yet-to-be-learned features, necessitating resort to a significantly larger label complexity for a sufficient test error reduction. Experimental results validate our findings.
Cite
Text
Bu et al. "Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples." International Conference on Machine Learning, 2024.Markdown
[Bu et al. "Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/bu2024icml-provably/)BibTeX
@inproceedings{bu2024icml-provably,
title = {{Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples}},
author = {Bu, Dake and Huang, Wei and Suzuki, Taiji and Cheng, Ji and Zhang, Qingfu and Xu, Zhiqiang and Wong, Hau-San},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {4642-4695},
volume = {235},
url = {https://mlanthology.org/icml/2024/bu2024icml-provably/}
}