RandAR: Decoder-Only Autoregressive Visual Generation in Random Orders
Abstract
We introduce RandAR, a decoder-only visual autoregressive (AR) model capable of generatng images in arbitrary token orders. Unlike previous decoder-only AR models that rely on a predefined generation order, RandAR removes this inductive bias, unlocking new capabilities in decoder-only generation. Our essential design enabling random order is to insert a "position instruction token" before each image token to be predicted, representing the spatial location of the next image token. Trained on randomly permuted token sequences -- a more challenging task than fixed-order generation, RandAR achieves comparable performance to conventional raster-order counterpart. More importantly, decoder-only transformers trained from random orders acquire new capabilities. For the efficiency bottleneck of AR models, RandAR adopts parallel decoding with KV-Cache at inference time, enjoying 2.5x acceleration without sacrificing generation quality. Additionally, RandAR supports in-painting, outpainting and resolution extrapolation in a zero-shot manner.We hope RandAR inspires new directions for decoder-only visual generation models and broadens their applications across diverse scenarios. Our project page is at https://rand-ar.github.io/.
Cite
Text
Pang et al. "RandAR: Decoder-Only Autoregressive Visual Generation in Random Orders." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00014Markdown
[Pang et al. "RandAR: Decoder-Only Autoregressive Visual Generation in Random Orders." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/pang2025cvpr-randar/) doi:10.1109/CVPR52734.2025.00014BibTeX
@inproceedings{pang2025cvpr-randar,
title = {{RandAR: Decoder-Only Autoregressive Visual Generation in Random Orders}},
author = {Pang, Ziqi and Zhang, Tianyuan and Luan, Fujun and Man, Yunze and Tan, Hao and Zhang, Kai and Freeman, William T. and Wang, Yu-Xiong},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {45-55},
doi = {10.1109/CVPR52734.2025.00014},
url = {https://mlanthology.org/cvpr/2025/pang2025cvpr-randar/}
}