Fine-Grained Analysis of In-Context Linear Estimation

Abstract

In this work, we develop a stronger characterization of the optimization and generalization landscape of ICL through contributions on architectures, low-rank parameterization, and correlated designs: (1) We study the landscape of 1-layer linear attention and 1-layer H3, a state-space model. Under a suitable correlated design assumption, we prove that both implement 1-step preconditioned gradient descent. (2) By studying correlated designs, we provide new risk bounds for retrieval augmented generation (RAG) and task-feature alignment which reveal how ICL sample complexity benefits from distributional alignment. (3) We derive the optimal risk for low-rank parameterized attention weights in terms of covariance spectrum. Through this, we also shed light on how LoRA can adapt to a new distribution by capturing the shift between task covariances.

Cite

Text

Li et al. "Fine-Grained Analysis of In-Context Linear Estimation." ICML 2024 Workshops: HiLD, 2024.

Markdown

[Li et al. "Fine-Grained Analysis of In-Context Linear Estimation." ICML 2024 Workshops: HiLD, 2024.](https://mlanthology.org/icmlw/2024/li2024icmlw-finegrained/)

BibTeX

@inproceedings{li2024icmlw-finegrained,
  title     = {{Fine-Grained Analysis of In-Context Linear Estimation}},
  author    = {Li, Yingcong and Rawat, Ankit Singh and Oymak, Samet},
  booktitle = {ICML 2024 Workshops: HiLD},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/li2024icmlw-finegrained/}
}