Transformer Efficiently Learns Low-Dimensional Target Functions In-Context
Abstract
Transformers can efficiently learn in-context from example demonstrations, and existing theoretical analyses studied this in-context learning (ICL) ability for linear function classes. However, this simplified linear setting arguable does not demonstrate the statistical efficiency of ICL, since the trained transformer does not outperform directly doing linear regression on the test prompt. We study ICL of a nonlinear function class in the form as $f_*(\boldsymbol{x}) = \sigma_*(\langle\boldsymbol{x},\ \boldsymbol{\beta}\rangle)$, called single-index model, via transformer with nonlinear MLP layer. When the index features $\boldsymbol{\beta}\in\mathbb{R}^d$ are drawn from a rank-$r$ subspace, we show that a nonlinear transformer optimized by gradient descent learns $f_*$ in-context with a prompt length that only depends on the dimension of function class $r$; in contrast, an algorithm that directly learns $f_*$ on test prompt yields a statistical complexity that scales with the ambient dimension $d$. Our result highlights the adaptivity of ICL to low-dimensional structures of the function class.
Cite
Text
Song et al. "Transformer Efficiently Learns Low-Dimensional Target Functions In-Context." ICML 2024 Workshops: TF2M, 2024.Markdown
[Song et al. "Transformer Efficiently Learns Low-Dimensional Target Functions In-Context." ICML 2024 Workshops: TF2M, 2024.](https://mlanthology.org/icmlw/2024/song2024icmlw-transformer/)BibTeX
@inproceedings{song2024icmlw-transformer,
title = {{Transformer Efficiently Learns Low-Dimensional Target Functions In-Context}},
author = {Song, Yujin and Wu, Denny and Oko, Kazusato and Suzuki, Taiji},
booktitle = {ICML 2024 Workshops: TF2M},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/song2024icmlw-transformer/}
}