Fast Gradient Computation for RoPE Attention in Almost Linear Time
Abstract
The Rotary Position Embedding (RoPE) mechanism has become a powerful enhancement to the Transformer architecture, which enables models to capture token relationships when encoding positional information. However, the RoPE mechanisms make the computations of attention mechanisms more complicated, which makes efficient algorithms challenging. Earlier research introduced almost linear time algorithms for the forward computation under specific parameter settings of bounded entries (i.e., in time $n^{1+o(1)}$ where $n$ is the number of input tokens), but has not addressed backward computation. In this work, we develop the first almost linear time algorithm for backward computations in the RoPE-based attention under bounded entries. Our approach builds on recent advancements in fast RoPE attention computations, utilizing a novel combination of the polynomial method and the Fast Fourier Transform. Furthermore, we show that with lower bounds derived from the Strong Exponential Time Hypothesis (SETH), the bounded entry condition is necessary for subquadratic performance.
Cite
Text
Chen et al. "Fast Gradient Computation for RoPE Attention in Almost Linear Time." ICLR 2025 Workshops: SCOPE, 2025.Markdown
[Chen et al. "Fast Gradient Computation for RoPE Attention in Almost Linear Time." ICLR 2025 Workshops: SCOPE, 2025.](https://mlanthology.org/iclrw/2025/chen2025iclrw-fast/)BibTeX
@inproceedings{chen2025iclrw-fast,
title = {{Fast Gradient Computation for RoPE Attention in Almost Linear Time}},
author = {Chen, Yifang and Huo, Jiayan and Li, Xiaoyu and Liang, Yingyu and Shi, Zhenmei and Song, Zhao},
booktitle = {ICLR 2025 Workshops: SCOPE},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/chen2025iclrw-fast/}
}