Selective Rotary Position Embedding
Abstract
Positional information is essential for language modeling. Softmax Transformers with Rotary Position Embeddings (RoPE) encode it with fixed-angle rotations, while linear Transformers rely on input-dependent gates that only decay past key-value norms. We provide a theoretical argument for the necessity of a rotation and decay component in well-performing sequence models, and observe that the missing ingredient in linear models is precisely the rotation that softmax attention performs implicitly. We introduce Selective Rotary Position Embedding (*Selective RoPE*), an input-dependent, learnable rotary embedding that generalizes RoPE to arbitrary angles and composes seamlessly with decay gates. Equipping gated linear attention with *Selective RoPE* yields a complex-valued recurrent layer that can be implemented efficiently with the “RoPE trick”. On synthetic benchmarks (MQAR, copying, state tracking) and 370M-parameter language-model pre-training, the method improves recall, downstream accuracy, and expressivity while adding minimal architectural overhead. We open-source our implementation [here](https://github.com/timurcarstensen/selective-rope).
Cite
Text
Movahedi et al. "Selective Rotary Position Embedding." International Conference on Learning Representations, 2026.Markdown
[Movahedi et al. "Selective Rotary Position Embedding." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/movahedi2026iclr-selective/)BibTeX
@inproceedings{movahedi2026iclr-selective,
title = {{Selective Rotary Position Embedding}},
author = {Movahedi, Sajad and Carstensen, Timur and Afzal, Arshia and Hutter, Frank and Orvieto, Antonio and Cevher, Volkan},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/movahedi2026iclr-selective/}
}