SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Abstract

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at https://super.gluebenchmark.com.

Cite

Text

Wang et al. "SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems." Neural Information Processing Systems, 2019.

Markdown

[Wang et al. "SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/wang2019neurips-superglue/)

BibTeX

@inproceedings{wang2019neurips-superglue,
  title     = {{SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems}},
  author    = {Wang, Alex and Pruksachatkun, Yada and Nangia, Nikita and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {3266-3280},
  url       = {https://mlanthology.org/neurips/2019/wang2019neurips-superglue/}
}