Diversify and Disambiguate: Learning from Underspecified Data

Abstract

Many datasets are underspecified, meaning that there are several equally viable solutions to a given task. Underspecified datasets can be problematic for methods that learn a single hypothesis because different functions that achieve low training loss can focus on different predictive features and thus have widely varying predictions on out-of-distribution data. We propose DivDis, a simple two-stage framework that first learns a collection of diverse hypotheses for a task by leveraging unlabeled data from the test distribution. We then disambiguate by selecting one of the discovered hypotheses using minimal additional supervision, in the form of additional labels or inspection of function visualization. We demonstrate the ability of DivDis to find robust hypotheses in image classification and natural language processing problems with underspecification.

Cite

Text

Lee et al. "Diversify and Disambiguate: Learning from Underspecified Data." ICML 2022 Workshops: SCIS, 2022.

Markdown

[Lee et al. "Diversify and Disambiguate: Learning from Underspecified Data." ICML 2022 Workshops: SCIS, 2022.](https://mlanthology.org/icmlw/2022/lee2022icmlw-diversify/)

BibTeX

@inproceedings{lee2022icmlw-diversify,
  title     = {{Diversify and Disambiguate: Learning from Underspecified Data}},
  author    = {Lee, Yoonho and Yao, Huaxiu and Finn, Chelsea},
  booktitle = {ICML 2022 Workshops: SCIS},
  year      = {2022},
  url       = {https://mlanthology.org/icmlw/2022/lee2022icmlw-diversify/}
}