Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Abstract
When solving challenging problems, language models (LMs) are able to identify relevant information from long and complicated contexts. To study how LMs solve retrieval tasks in diverse situations, we introduce ORION, a collection of structured retrieval tasks, from text understanding to coding. We apply causal analysis on ORION for 18 open-source language models with sizes ranging from 125 million to 70 billion parameters. We find that LMs internally decompose retrieval tasks in a modular way: middle layers at the last token position process the request, while late layers retrieve the correct entity from the context. Building on our high-level understanding, we demonstrate a proof of concept application for scalable internal oversight of LMs to mitigate prompt-injection while requiring human supervision on only a single input.
Cite
Text
Variengien and Winsor. "Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models." ICML 2024 Workshops: MI, 2024.Markdown
[Variengien and Winsor. "Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models." ICML 2024 Workshops: MI, 2024.](https://mlanthology.org/icmlw/2024/variengien2024icmlw-look/)BibTeX
@inproceedings{variengien2024icmlw-look,
title = {{Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models}},
author = {Variengien, Alexandre and Winsor, Eric},
booktitle = {ICML 2024 Workshops: MI},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/variengien2024icmlw-look/}
}