Task Descriptors Help Transformers Learn Linear Models In-Context
Abstract
Large language models (LLM) exhibit strong in-context learning (ICL) ability, which allows the model to make predictions on new examples based on the given prompt. Recently, a line of research (Von Oswald et al., 2023; Aky ̈urek et al., 2023; Ahn et al., 2023; Mahankali et al., 2023; Zhang et al., 2023) considered ICL for a simple linear regression setting and showed that the forward pass of Transformers is simulating some variants of gradient descent (GD) algorithms on the in-context examples. In practice, the input prompt usually contains two types of information: in-context examples and the task description. Therefore, in this research, we will try to theoretically investigate how the task description helps ICL. Specifically, our input prompt contains not only in-context examples but also a “task descriptor”.We empirically show that the trained transformer can achieve significantly lower loss for ICL when the task descriptor is provided. We further give a global convergence theorem, where the converged parameters match our experimental result.
Cite
Text
Huang and Ge. "Task Descriptors Help Transformers Learn Linear Models In-Context." ICML 2024 Workshops: ICL, 2024.Markdown
[Huang and Ge. "Task Descriptors Help Transformers Learn Linear Models In-Context." ICML 2024 Workshops: ICL, 2024.](https://mlanthology.org/icmlw/2024/huang2024icmlw-task/)BibTeX
@inproceedings{huang2024icmlw-task,
title = {{Task Descriptors Help Transformers Learn Linear Models In-Context}},
author = {Huang, Ruomin and Ge, Rong},
booktitle = {ICML 2024 Workshops: ICL},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/huang2024icmlw-task/}
}