Transformers Are Universal Predictors

Basu, Sourya; Choraria, Moulik; Varshney, Lav R.

Transformers Are Universal Predictors

Sourya Basu, Moulik Choraria, Lav R. Varshney

ICMLW 2023

/icmlw/2023/basu2023icmlw-transformers/

Abstract

We find limits to the Transformer architecture for language modeling and show it has a universal prediction property in an information-theoretic sense. We further analyze their performance in non-asymptotic data regimes to understand the role of various components of the Transformer architecture, especially in the context of data-efficient training. We validate our theoretical analysis with experiments on both synthetic and real datasets.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Basu et al. "Transformers Are Universal Predictors." ICML 2023 Workshops: NCW, 2023.

Markdown

[Basu et al. "Transformers Are Universal Predictors." ICML 2023 Workshops: NCW, 2023.](https://mlanthology.org/icmlw/2023/basu2023icmlw-transformers/)

BibTeX

@inproceedings{basu2023icmlw-transformers,
  title     = {{Transformers Are Universal Predictors}},
  author    = {Basu, Sourya and Choraria, Moulik and Varshney, Lav R.},
  booktitle = {ICML 2023 Workshops: NCW},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/basu2023icmlw-transformers/}
}