Selective Rotary Position Embedding

Abstract

Positional information is essential for language modeling. Softmax Transformers with Rotary Position Embeddings (RoPE) encode it with fixed-angle rotations, while linear Transformers rely on input-dependent gates that only decay past key-value norms. We provide a theoretical argument for the necessity of a rotation and decay component in well-performing sequence models, and observe that the missing ingredient in linear models is precisely the rotation that softmax attention performs implicitly. We introduce Selective Rotary Position Embedding (*Selective RoPE*), an input-dependent, learnable rotary embedding that generalizes RoPE to arbitrary angles and composes seamlessly with decay gates. Equipping gated linear attention with *Selective RoPE* yields a complex-valued recurrent layer that can be implemented efficiently with the “RoPE trick”. On synthetic benchmarks (MQAR, copying, state tracking) and 370M-parameter language-model pre-training, the method improves recall, downstream accuracy, and expressivity while adding minimal architectural overhead. We open-source our implementation [here](https://github.com/timurcarstensen/selective-rope).

Cite

Text

Movahedi et al. "Selective Rotary Position Embedding." International Conference on Learning Representations, 2026.

Markdown

[Movahedi et al. "Selective Rotary Position Embedding." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/movahedi2026iclr-selective/)

BibTeX

@inproceedings{movahedi2026iclr-selective,
  title     = {{Selective Rotary Position Embedding}},
  author    = {Movahedi, Sajad and Carstensen, Timur and Afzal, Arshia and Hutter, Frank and Orvieto, Antonio and Cevher, Volkan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/movahedi2026iclr-selective/}
}