In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Abstract

We study the \emph{in-context learning} (ICL) ability of a \emph{Linear Transformer Block} (LTB) that combines a linear attention component and a linear multi-layer perceptron (MLP) component. For ICL of linear regression with a Gaussian prior and a \emph{non-zero mean}, we show that LTB can achieve nearly Bayes optimal ICL risk. In contrast, using only linear attention must incur an irreducible additive approximation error. Furthermore, we establish a correspondence between LTB and one-step gradient descent estimators with learnable initialization ($\mathsf{GD}-\beta$), in the sense that every $\mathsf{GD}-\beta$ estimator can be implemented by an LTB estimator and every optimal LTB estimator that minimizes the in-class ICL risk is effectively a $\mathsf{GD}-\beta$ estimator.Finally, we show that $\mathsf{GD}-\beta$ estimators can be efficiently optimized with gradient flow, despite a non-convex training objective.Our results reveal that LTB achieves ICL by implementing $\mathsf{GD}-\beta$, and they highlight the role of MLP layers in reducing approximation error.

Cite

Text

Zhang et al. "In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization." Neural Information Processing Systems, 2024. doi:10.52202/079017-0581

Markdown

[Zhang et al. "In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/zhang2024neurips-incontext/) doi:10.52202/079017-0581

BibTeX

@inproceedings{zhang2024neurips-incontext,
  title     = {{In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization}},
  author    = {Zhang, Ruiqi and Wu, Jingfeng and Bartlett, Peter L.},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0581},
  url       = {https://mlanthology.org/neurips/2024/zhang2024neurips-incontext/}
}