AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Abstract

We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating k-nearest neighbor retrievals at the patch level. Unlike prior methods that perform a single, static retrieval before generation and condition the entire generation on fixed reference images, AR-RAG performs context-aware retrievals at each generation step, using prior-generated patches as queries to retrieve and incorporate the most relevant patch-level visual references, enabling the model to respond to evolving generation needs while avoiding limitations (e.g., over-copying, stylistic bias, etc.) prevalent in existing methods. To realize AR-RAG, we propose two parallel frameworks: (1) Distribution-Augmentation in Decoding (DAiD), a training-free plug-and-use decoding strategy that directly merges the distribution of model-predicted patches with the distribution of retrieved patches, and (2) Feature-Augmentation in Decoding (FAiD), a parameter-efficient fine-tuning method that progressively smooths the features of retrieved patches via multi-scale convolution operations and leverages them to augment the image generation process. We validate the effectiveness of AR-RAG on widely adopted benchmarks, including Midjourney-30K, GenEval and DPG-Bench, demonstrating significant performance gains over state-of-the-art image generation models.

Cite

Text

Qi et al. "AR-RAG: Autoregressive Retrieval Augmentation for Image Generation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Qi et al. "AR-RAG: Autoregressive Retrieval Augmentation for Image Generation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/qi2025neurips-arrag/)

BibTeX

@inproceedings{qi2025neurips-arrag,
  title     = {{AR-RAG: Autoregressive Retrieval Augmentation for Image Generation}},
  author    = {Qi, Jingyuan and Xu, Zhiyang and Wang, Qifan and Huang, Lifu},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/qi2025neurips-arrag/}
}