Fine-Grained Analysis of In-Context Linear Estimation
Abstract
In this work, we develop a stronger characterization of the optimization and generalization landscape of ICL through contributions on architectures, low-rank parameterization, and correlated designs: (1) We study the landscape of 1-layer linear attention and 1-layer H3, a state-space model. Under a suitable correlated design assumption, we prove that both implement 1-step preconditioned gradient descent. (2) By studying correlated designs, we provide new risk bounds for retrieval augmented generation (RAG) and task-feature alignment which reveal how ICL sample complexity benefits from distributional alignment. (3) We derive the optimal risk for low-rank parameterized attention weights in terms of covariance spectrum. Through this, we also shed light on how LoRA can adapt to a new distribution by capturing the shift between task covariances.
Cite
Text
Li et al. "Fine-Grained Analysis of In-Context Linear Estimation." ICML 2024 Workshops: HiLD, 2024.Markdown
[Li et al. "Fine-Grained Analysis of In-Context Linear Estimation." ICML 2024 Workshops: HiLD, 2024.](https://mlanthology.org/icmlw/2024/li2024icmlw-finegrained/)BibTeX
@inproceedings{li2024icmlw-finegrained,
title = {{Fine-Grained Analysis of In-Context Linear Estimation}},
author = {Li, Yingcong and Rawat, Ankit Singh and Oymak, Samet},
booktitle = {ICML 2024 Workshops: HiLD},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/li2024icmlw-finegrained/}
}