Improved Techniques for Training Smaller and Faster Stable Diffusion
Abstract
Recent SoTA text-to-image diffusion models achieve impressive generation quality but their computational cost has been prohibitively large. Network pruning and step distillation are two widely-used compression techniques to reduce the model size and inference steps. This work presents a few improved techniques in these aspects to train smaller and faster diffusion models with a cheap training cost. Specifically, compared to the prior SoTA counterparts, we introduce a structured pruning method to remove insignificant weight blocks based an improved performance sensitivity. To regain performance after pruning, a CFG-aware retraining loss is proposed, which is shown critical to performance. Finally, a modified CFG-aware step distillation is used to reduce the steps. Empirically, our method manages to prune the U-Net parameters of SD v2.1 base by 46\%, inference steps reduced from 25 to 8, achieving an overall $3.0\times$ wall-clock inference speedup. Our 8-step model is significantly better than 25-step BK-SDM, the prior SoTA for cheap Stable Diffusion, while being even smaller.
Cite
Text
Wang and Wang. "Improved Techniques for Training Smaller and Faster Stable Diffusion." ICLR 2025 Workshops: DeLTa, 2025.Markdown
[Wang and Wang. "Improved Techniques for Training Smaller and Faster Stable Diffusion." ICLR 2025 Workshops: DeLTa, 2025.](https://mlanthology.org/iclrw/2025/wang2025iclrw-improved/)BibTeX
@inproceedings{wang2025iclrw-improved,
title = {{Improved Techniques for Training Smaller and Faster Stable Diffusion}},
author = {Wang, Hesong and Wang, Huan},
booktitle = {ICLR 2025 Workshops: DeLTa},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/wang2025iclrw-improved/}
}