Comparing Bottom-up and Top-Down Steering Approaches on In-Context Learning Tasks
Abstract
A key objective of interpretability research on large language models (LLMs) is to develop methods for robustly steering models toward desired behaviors. To this end, two distinct approaches to interpretability—"bottom-up" and "top-down"—have been presented, but there has been little quantitative comparison between them. We present a case study comparing the effectiveness of representative vector steering methods from each branch: function vectors (FVs) [Todd et al., 2024], as a bottom-up method, and in-context vectors (ICVs) [Liu et al., 2024] as a top-down method. While both aim to capture compact representations of broad in-context learning tasks, we find they are effective only on specific types of tasks: ICVs outperform FVs in behavioral shifting, whereas FVs excel in tasks requiring more precision. We discuss the implications for future evaluations of steering methods and for further research into top-down and bottom-up steering given these findings.
Cite
Text
Brumley et al. "Comparing Bottom-up and Top-Down Steering Approaches on In-Context Learning Tasks." NeurIPS 2024 Workshops: MINT, 2024.Markdown
[Brumley et al. "Comparing Bottom-up and Top-Down Steering Approaches on In-Context Learning Tasks." NeurIPS 2024 Workshops: MINT, 2024.](https://mlanthology.org/neuripsw/2024/brumley2024neuripsw-comparing/)BibTeX
@inproceedings{brumley2024neuripsw-comparing,
title = {{Comparing Bottom-up and Top-Down Steering Approaches on In-Context Learning Tasks}},
author = {Brumley, Madeline and Kwon, Joe and Krueger, David and Krasheninnikov, Dmitrii and Anwar, Usman},
booktitle = {NeurIPS 2024 Workshops: MINT},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/brumley2024neuripsw-comparing/}
}