From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Le Zhuo, Liangbing Zhao, Sayak Paul, Yue Liao, Renrui Zhang, Yi Xin, Peng Gao, Mohamed Elhoseiny, Hongsheng Li

ICCV 2025 pp. 15329-15339

/iccv/2025/zhuo2025iccv-reflection/

Abstract

Recent text-to-image diffusion models achieve impressive visual quality through extensive scaling of training data and model parameters, yet they often struggle with complex scenes and fine-grained details. Inspired by the self-reflection capabilities emergent in large language models, we propose ReflectionFlow, an inference-time framework enabling diffusion models to iteratively reflect upon and refine their outputs. ReflectionFlow introduces three complementary inference-time scaling axes: (1) noise-level scaling to optimize latent initialization; (2) prompt-level scaling for precise semantic guidance; and most notably, (3) reflection-level scaling, which explicitly provides actionable reflections to iteratively assess and correct previous generations. To facilitate reflection-level scaling, we construct GenRef, a large-scale dataset comprising 1 million triplets, each containing a reflection, a flawed image, and an enhanced image. Leveraging this dataset, we efficiently perform reflection tuning on state-of-the-art diffusion transformer, FLUX.1-dev, by jointly modeling multimodal inputs within a unified framework. Experimental results show that ReflectionFlow significantly outperforms naive noise-level scaling methods, offering a scalable and compute-efficient solution toward higher-quality image synthesis on challenging tasks. All code, checkpoints, and datasets are available at https://diffusion-cot.github.io/reflection2perfection.

PDF ICCV Semantic Scholar

Cite

Text

Zhuo et al. "From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning." International Conference on Computer Vision, 2025.

Markdown

[Zhuo et al. "From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zhuo2025iccv-reflection/)

BibTeX

@inproceedings{zhuo2025iccv-reflection,
  title     = {{From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning}},
  author    = {Zhuo, Le and Zhao, Liangbing and Paul, Sayak and Liao, Yue and Zhang, Renrui and Xin, Yi and Gao, Peng and Elhoseiny, Mohamed and Li, Hongsheng},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {15329-15339},
  url       = {https://mlanthology.org/iccv/2025/zhuo2025iccv-reflection/}
}