ASIDE: Architectural Separation of Instructions and Data in Language Models
Abstract
Despite their remarkable performance, large language models lack elementary safety features, making them susceptible to numerous malicious attacks. In particular, previous work has identified the absence of an intrinsic separation between instructions and data as the root cause of the success of prompt injection attacks. In this work, we propose a new architectural element, ASIDE, that allows language models to clearly separate instructions and data at the level of token embeddings. ASIDE applies an orthogonal rotation to the embeddings of data tokens, thus creating clearly distinct representations of instructions and data tokens without introducing any additional parameters. As we demonstrate experimentally across a range of models, instruction-tuning LLMs with ASIDE (1) achieves substantially higher instruction-data separation without performance loss and (2) makes the models more robust to prompt injection benchmarks, even without dedicated safety training. Additionally, we provide insights into the mechanism underlying our method through an analysis of the model representations.
Cite
Text
Zverev et al. "ASIDE: Architectural Separation of Instructions and Data in Language Models." International Conference on Learning Representations, 2026.Markdown
[Zverev et al. "ASIDE: Architectural Separation of Instructions and Data in Language Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zverev2026iclr-aside/)BibTeX
@inproceedings{zverev2026iclr-aside,
title = {{ASIDE: Architectural Separation of Instructions and Data in Language Models}},
author = {Zverev, Egor and Kortukov, Evgenii and Panfilov, Alexander and Volkova, Alexandra and Tabesh, Rush and Lapuschkin, Sebastian and Samek, Wojciech and Lampert, Christoph H.},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/zverev2026iclr-aside/}
}