Active Learning over Multiple Domains in Natural Language Tasks

Abstract

Studies of active learning traditionally assume the target and source data stem from a single domain. However, in realistic applications, practitioners often require active learning with multiple sources of out-of-distribution data, where it is unclear a priori which data sources will help or hurt the target domain. We survey a wide variety of techniques in active learning (AL), domain shift detection (DS), and multi-domain sampling to examine this challenging setting for question answering and sentiment analysis. Among 18 acquisition functions from 4 families of methods, we find H-Divergence methods, and particularly our proposed variant DAL-E, yield effective results, averaging 2-3% improvements over the random baseline. Our findings yield the first comprehensive analysis of both existing and novel methods for practitioners faced with multi-domain active learning for natural language tasks.

Cite

Text

Longpre et al. "Active Learning over Multiple Domains in Natural Language Tasks." NeurIPS 2022 Workshops: DistShift, 2022.

Markdown

[Longpre et al. "Active Learning over Multiple Domains in Natural Language Tasks." NeurIPS 2022 Workshops: DistShift, 2022.](https://mlanthology.org/neuripsw/2022/longpre2022neuripsw-active/)

BibTeX

@inproceedings{longpre2022neuripsw-active,
  title     = {{Active Learning over Multiple Domains in Natural Language Tasks}},
  author    = {Longpre, Shayne and Reisler, Julia Rachel and Huang, Edward Greg and Lu, Yi and Frank, Andrew and Ramesh, Nikhil and DuBois, Christopher},
  booktitle = {NeurIPS 2022 Workshops: DistShift},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/longpre2022neuripsw-active/}
}