Bilevel Optimization to Learn Training Distributions for Language Modeling Under Domain Shift

Abstract

Language models trained on very large web corpora have become a central piece of modern language processing. In this paradigm, the large, heterogeneous training set rarely matches the distribution of the application domain. This work considers modifying the training distribution in the case where one can observe a small sample of data reflecting the test conditions. We propose an algorithm based on recent formulation of this problem as an online, bilevel optimization problem. We show that this approach compares favorably with alternative strategies from the domain adaptation literature. [Extended version available at arXiv:2311.11973]

Cite

Text

Grangier et al. "Bilevel Optimization to Learn Training Distributions for Language Modeling Under Domain Shift." NeurIPS 2023 Workshops: DistShift, 2023.

Markdown

[Grangier et al. "Bilevel Optimization to Learn Training Distributions for Language Modeling Under Domain Shift." NeurIPS 2023 Workshops: DistShift, 2023.](https://mlanthology.org/neuripsw/2023/grangier2023neuripsw-bilevel/)

BibTeX

@inproceedings{grangier2023neuripsw-bilevel,
  title     = {{Bilevel Optimization to Learn Training Distributions for Language Modeling Under Domain Shift}},
  author    = {Grangier, David and Ablin, Pierre and Hannun, Awni},
  booktitle = {NeurIPS 2023 Workshops: DistShift},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/grangier2023neuripsw-bilevel/}
}