In-Context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-Separation

Abstract

This paper investigates approximation-theoretic aspects of the in-context learning capability of the transformers in representing a family of noisy linear dynamical systems. Our first theoretical result establishes an upper bound on the approximation error of multi-layer transformers with respect to an $L^2$-testing loss uniformly defined across tasks. This result demonstrates that transformers with logarithmic depth can achieve error bounds comparable with those of the least-squares estimator. In contrast, our second result establishes a non-diminishing lower bound on the approximation error for a class of single-layer linear transformers, which suggests a depth-separation phenomenon for transformers in the in-context learning of dynamical systems. Moreover, this second result uncovers a critical distinction in the approximation power of single-layer linear transformers when learning from IID versus non-IID data.

Cite

Text

Cole et al. "In-Context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-Separation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Cole et al. "In-Context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-Separation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/cole2025neurips-incontext/)

BibTeX

@inproceedings{cole2025neurips-incontext,
  title     = {{In-Context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-Separation}},
  author    = {Cole, Frank and Zhao, Yuxuan and Lu, Yulong and Zhang, Tianhao},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/cole2025neurips-incontext/}
}