Uncertainty-Based Active Learning for Reading Comprehension
Abstract
Recent years have witnessed a surge of successful applications of machine reading comprehension. Of central importance to these tasks is the availability of massive amount of labeled data, which facilitates training of large-scale neural networks. However, in many real-world problems, annotated data are expensive to gather not only because of time cost and budget, but also of certain domain-specific restrictions such as privacy for healthcare data. In this regard, we propose an uncertainty-based active learning algorithm for reading comprehension, which interleaves data annotation and model updating to mitigate the demand of labeling. Our key techniques are two-fold: 1) an unsupervised uncertainty-based sampling scheme that queries the labels of the most informative instances with respect to the currently learned model; and 2) an adaptive loss minimization paradigm that simultaneously fits the data and controls the degree of model updating. We demonstrate on benchmark datasets that 25% less labeled samples suffice to guarantee similar, or even improved performance. Our results show strong evidence that for label-demanding scenarios, the proposed approach offers a practical guide on data collection and model training.
Cite
Text
Wang et al. "Uncertainty-Based Active Learning for Reading Comprehension." Transactions on Machine Learning Research, 2022.Markdown
[Wang et al. "Uncertainty-Based Active Learning for Reading Comprehension." Transactions on Machine Learning Research, 2022.](https://mlanthology.org/tmlr/2022/wang2022tmlr-uncertaintybased/)BibTeX
@article{wang2022tmlr-uncertaintybased,
title = {{Uncertainty-Based Active Learning for Reading Comprehension}},
author = {Wang, Jing and Shen, Jie and Ma, Xiaofei and Arnold, Andrew},
journal = {Transactions on Machine Learning Research},
year = {2022},
url = {https://mlanthology.org/tmlr/2022/wang2022tmlr-uncertaintybased/}
}