On the Role of Attention in Prompt-Tuning
Abstract
Prompt-tuning is an emerging strategy to adapt large language models (LLM) to downstream tasks by learning a (soft-)prompt parameter from data. Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting. In this work, we explore prompt-tuning for one-layer attention architectures and study contextual mixture-models where each input token belongs to a context-relevant or -irrelevant set. We isolate the role of prompt-tuning through a self-contained prompt-attention model. Our contributions are as follows: (1) We show that softmax-prompt-attention is provably more expressive than softmax-self-attention and linear-prompt-attention under our contextual data model. (2) We analyze the initial trajectory of gradient descent and show that it learns the prompt and prediction head with near-optimal sample complexity and demonstrate how prompt can provably attend to sparse context-relevant tokens. We also provide experiments that verify our theoretical insights on real datasets and demonstrate how prompt-tuning enables the model to attend to context-relevant information.
Cite
Text
Oymak et al. "On the Role of Attention in Prompt-Tuning." ICLR 2023 Workshops: ME-FoMo, 2023.Markdown
[Oymak et al. "On the Role of Attention in Prompt-Tuning." ICLR 2023 Workshops: ME-FoMo, 2023.](https://mlanthology.org/iclrw/2023/oymak2023iclrw-role/)BibTeX
@inproceedings{oymak2023iclrw-role,
title = {{On the Role of Attention in Prompt-Tuning}},
author = {Oymak, Samet and Rawat, Ankit Singh and Soltanolkotabi, Mahdi and Thrampoulidis, Christos},
booktitle = {ICLR 2023 Workshops: ME-FoMo},
year = {2023},
url = {https://mlanthology.org/iclrw/2023/oymak2023iclrw-role/}
}