Uncertainty-Based Active Learning for Reading Comprehension

Jing Wang, Jie Shen, Xiaofei Ma, Andrew Arnold

TMLR 2022

/tmlr/2022/wang2022tmlr-uncertaintybased/

Abstract

Recent years have witnessed a surge of successful applications of machine reading comprehension. Of central importance to these tasks is the availability of massive amount of labeled data, which facilitates training of large-scale neural networks. However, in many real-world problems, annotated data are expensive to gather not only because of time cost and budget, but also of certain domain-specific restrictions such as privacy for healthcare data. In this regard, we propose an uncertainty-based active learning algorithm for reading comprehension, which interleaves data annotation and model updating to mitigate the demand of labeling. Our key techniques are two-fold: 1) an unsupervised uncertainty-based sampling scheme that queries the labels of the most informative instances with respect to the currently learned model; and 2) an adaptive loss minimization paradigm that simultaneously fits the data and controls the degree of model updating. We demonstrate on benchmark datasets that 25% less labeled samples suffice to guarantee similar, or even improved performance. Our results show strong evidence that for label-demanding scenarios, the proposed approach offers a practical guide on data collection and model training.

PDF TMLR Code Semantic Scholar

Cite

Text

Wang et al. "Uncertainty-Based Active Learning for Reading Comprehension." Transactions on Machine Learning Research, 2022.

Markdown

[Wang et al. "Uncertainty-Based Active Learning for Reading Comprehension." Transactions on Machine Learning Research, 2022.](https://mlanthology.org/tmlr/2022/wang2022tmlr-uncertaintybased/)

BibTeX

@article{wang2022tmlr-uncertaintybased,
  title     = {{Uncertainty-Based Active Learning for Reading Comprehension}},
  author    = {Wang, Jing and Shen, Jie and Ma, Xiaofei and Arnold, Andrew},
  journal   = {Transactions on Machine Learning Research},
  year      = {2022},
  url       = {https://mlanthology.org/tmlr/2022/wang2022tmlr-uncertaintybased/}
}