Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Abstract

Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but collapse when asked to magnify far beyond that regime. We address this scalability bottleneck with Chain-of-Zoom (CoZ), a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM). The prompt extractor itself is fine-tuned using Generalized Reward Policy Optimization (GRPO) with a critic VLM, aligning text guidance towards human preference. Experiments show that a standard $4\times$ diffusion SR model wrapped in CoZ attains beyond $256\times$ enlargement with high perceptual quality and fidelity.

Cite

Text

Kim et al. "Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment." Advances in Neural Information Processing Systems, 2025.

Markdown

[Kim et al. "Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/kim2025neurips-chainofzoom/)

BibTeX

@inproceedings{kim2025neurips-chainofzoom,
  title     = {{Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment}},
  author    = {Kim, Bryan Sangwoo and Kim, Jeongsol and Ye, Jong Chul},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/kim2025neurips-chainofzoom/}
}