DISCO: Disentangled Communication Steering for Large Language Models

Abstract

A variety of recent methods guide large language model outputs via the inference-time addition of *steering vectors* to residual-stream or attention-head representations. In contrast, we propose to inject steering vectors directly into the query and value representation spaces within attention heads. We provide evidence that a greater portion of these spaces exhibit high linear discriminability of concepts --a key property motivating the use of steering vectors-- than attention head outputs. We analytically characterize the effect of our method, which we term *DISentangled COmmunication (DISCO) Steering*, on attention head outputs. Our analysis reveals that DISCO disentangles a strong but underutilized baseline, steering attention head inputs, which implicitly modifies queries and values in a rigid manner. In contrast, DISCO's direct modulation of these components enables more granular control. We find that DISCO achieves superior performance over a number of steering vector baselines across multiple datasets on LLaMA 3.1 8B and Gemma 2 9B, with steering efficacy scoring up to $19.1$% higher than the runner-up. Our results support the conclusion that the query and value spaces are powerful building blocks for steering vector methods. Our code is publicly available at https://github.com/MaxTorop/DISCO.

Cite

Text

Torop et al. "DISCO: Disentangled Communication Steering for Large Language Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Torop et al. "DISCO: Disentangled Communication Steering for Large Language Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/torop2025neurips-disco/)

BibTeX

@inproceedings{torop2025neurips-disco,
  title     = {{DISCO: Disentangled Communication Steering for Large Language Models}},
  author    = {Torop, Max and Masoomi, Aria and Eskandar, Masih and Dy, Jennifer},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/torop2025neurips-disco/}
}