TinyFusion: Diffusion Transformers Learned Shallow
Abstract
Diffusion Transformers have demonstrated remarkable capabilities in image generation but often come with excessive parameterization, resulting in considerable inference overhead in real-world applications. In this work, we present TinyFusion, a depth pruning method designed to remove redundant layers from diffusion transformers via end-to-end learning. The core principle of our approach is to create a pruned model with high recoverability, allowing it to regain strong performance after fine-tuning. To accomplish this, we introduce a differentiable sampling technique to make pruning learnable, paired with a co-optimized parameter to simulate future fine-tuning. While prior works focus on minimizing loss or error after pruning, our method explicitly models and optimizes the post-fine-tuning performance of pruned models. Experimental results indicate that this learnable paradigm offers substantial benefits for layer pruning of diffusion transformers, surpassing existing importance-based and error-based methods. Additionally, TinyFusion exhibits strong generalization across diverse architectures, such as DiTs, MARs, and SiTs. Experiments with DiT-XL show that TinyFusion can craft a shallow diffusion transformer at less than 7% of the pre-training cost, achieving a 2xspeedup with an FID score of 2.86, outperforming competitors with comparable efficiency. Code is available at https://github.com/VainF/TinyFusion
Cite
Text
Fang et al. "TinyFusion: Diffusion Transformers Learned Shallow." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01691Markdown
[Fang et al. "TinyFusion: Diffusion Transformers Learned Shallow." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/fang2025cvpr-tinyfusion/) doi:10.1109/CVPR52734.2025.01691BibTeX
@inproceedings{fang2025cvpr-tinyfusion,
title = {{TinyFusion: Diffusion Transformers Learned Shallow}},
author = {Fang, Gongfan and Li, Kunjun and Ma, Xinyin and Wang, Xinchao},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {18144-18154},
doi = {10.1109/CVPR52734.2025.01691},
url = {https://mlanthology.org/cvpr/2025/fang2025cvpr-tinyfusion/}
}