David and Goliath: Small One-Step Model Beats Large Diffusion with Score Post-Training
Abstract
We propose Diff-Instruct(DI), a data-efficient post-training approach to one-step text-to-image generative models to improve its human preferences without requiring image data. Our method frames alignment as online reinforcement learning from human feedback (RLHF), which optimizes a human reward function while regularizing the generator to stay close to a reference diffusion process. Unlike traditional RLHF approaches, which rely on the KL divergence for regularization, we introduce a novel score-based divergence regularization that substantially improves performance. Although such a score-based RLHF objective seems intractable when optimizing, we derive a strictly equivalent tractable loss function in theory that can efficiently compute its gradient for optimizations. Building upon this framework, we train DI-SDXL-1step, a 1-step text-to-image model based on Stable Diffusion-XL (2.6B parameters), capable of generating 1024x1024 resolution images in a single step. The 2.6B DI-SDXL-1step model outperforms the 12B FLUX-dev model in ImageReward, PickScore, and CLIP score on the Parti prompts benchmark while using only 1.88% of the inference time. This result strongly supports the thought that with proper post-training, the small one-step model is capable of beating huge multi-step models. We will open-source our industry-ready model to the community.
Cite
Text
Luo et al. "David and Goliath: Small One-Step Model Beats Large Diffusion with Score Post-Training." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Luo et al. "David and Goliath: Small One-Step Model Beats Large Diffusion with Score Post-Training." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/luo2025icml-david/)BibTeX
@inproceedings{luo2025icml-david,
title = {{David and Goliath: Small One-Step Model Beats Large Diffusion with Score Post-Training}},
author = {Luo, Weijian and Zhang, Colin and Zhang, Debing and Geng, Zhengyang},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {41520-41539},
volume = {267},
url = {https://mlanthology.org/icml/2025/luo2025icml-david/}
}