From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation
Abstract
Autoregressive (AR) models have emerged as a powerful framework for image generation, yet they remain bound by a fundamental limitation: once a prediction is made, it cannot be revised. Each step marches forward in a strict left-to-right sequence, causing small errors to accumulate and compromise the final image. In this work, we reimagine this process with TensorAR, a decoder-only AR model that shifts from predicting discrete tokens to predicting overlapping tensor windows. This simple change transforms image synthesis into a process of next-tensor prediction, enabling the model to refine earlier outputs while preserving the causal structure that defines autoregression. To guard against information leakage during training, we introduce a discrete tensor noising mechanism inspired by discrete diffusion theory, which injects categorical noise into input tensors. TensorAR is designed to be plug-and-play: unlike masked AR methods, it requires no architectural modifications, and unlike autoregressive diffusion, it preserves the familiar AR training paradigm. We evaluate TensorAR across both class-to-image and text-to-image tasks, showing consistent gains in generation quality and instruction-following ability, while achieving a superior balance between quality and latency. In doing so, TensorAR offers a new path forward for autoregressive generation---one where predictions are not just produced, but continually refined.
Cite
Text
Cheng et al. "From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation." International Conference on Learning Representations, 2026.Markdown
[Cheng et al. "From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/cheng2026iclr-prediction/)BibTeX
@inproceedings{cheng2026iclr-prediction,
title = {{From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation}},
author = {Cheng, Cheng and Song, Lin and An, Di and Xiao, Yicheng and Zhang, Xuchong and Sun, Hongbin and Shan, Ying},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/cheng2026iclr-prediction/}
}