Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models

Abstract

When solving challenging problems, language models (LMs) are able to identify relevant information from long and complicated contexts. To study how LMs solve retrieval tasks in diverse situations, we introduce ORION, a collection of structured retrieval tasks, from text understanding to coding. We apply causal analysis on ORION for 18 open-source language models with sizes ranging from 125 million to 70 billion parameters. We find that LMs internally decompose retrieval tasks in a modular way: middle layers at the last token position process the request, while late layers retrieve the correct entity from the context. Building on our high-level understanding, we demonstrate a proof of concept application for scalable internal oversight of LMs to mitigate prompt-injection while requiring human supervision on only a single input.

Cite

Text

Variengien and Winsor. "Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models." ICML 2024 Workshops: MI, 2024.

Markdown

[Variengien and Winsor. "Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models." ICML 2024 Workshops: MI, 2024.](https://mlanthology.org/icmlw/2024/variengien2024icmlw-look/)

BibTeX

@inproceedings{variengien2024icmlw-look,
  title     = {{Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models}},
  author    = {Variengien, Alexandre and Winsor, Eric},
  booktitle = {ICML 2024 Workshops: MI},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/variengien2024icmlw-look/}
}