Prompting a Pretrained Transformer Can Be a Universal Approximator

Abstract

Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited. A key question is whether one can arbitrarily modify the behavior of a pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed. In fact, prefix-tuning a single attention head is sufficient to approximate any continuous function making the attention mechanism uniquely suited for universal approximation. Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length. Beyond these density-type results, we also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision.

Cite

Text

Petrov et al. "Prompting a Pretrained Transformer Can Be a Universal Approximator." International Conference on Machine Learning, 2024.

Markdown

[Petrov et al. "Prompting a Pretrained Transformer Can Be a Universal Approximator." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/petrov2024icml-prompting/)

BibTeX

@inproceedings{petrov2024icml-prompting,
  title     = {{Prompting a Pretrained Transformer Can Be a Universal Approximator}},
  author    = {Petrov, Aleksandar and Torr, Philip and Bibi, Adel},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {40523-40550},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/petrov2024icml-prompting/}
}