Real: A Representative Error-Driven Approach for Active Learning

Abstract

Given a limited labeling budget, active learning ( al ) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, al typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose Real , a novel approach to select data instances with R epresentative E rrors for A ctive L earning. It identifies minority predictions as pseudo errors within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that Real consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that Real selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real .

Cite

Text

Chen et al. "Real: A Representative Error-Driven Approach for Active Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43412-9_2

Markdown

[Chen et al. "Real: A Representative Error-Driven Approach for Active Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/chen2023ecmlpkdd-real/) doi:10.1007/978-3-031-43412-9_2

BibTeX

@inproceedings{chen2023ecmlpkdd-real,
  title     = {{Real: A Representative Error-Driven Approach for Active Learning}},
  author    = {Chen, Cheng and Wang, Yong and Liao, Lizi and Chen, Yueguo and Du, Xiaoyong},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {20-37},
  doi       = {10.1007/978-3-031-43412-9_2},
  url       = {https://mlanthology.org/ecmlpkdd/2023/chen2023ecmlpkdd-real/}
}