Statistical Foundations of Conditional Diffusion Transformers

Abstract

We explore the statistical foundations of conditional diffusion transformers (DiTs) with classifier-free guidance. Through a comprehensive analysis of ``in-context'' conditional DiTs under four data assumptions, we demonstrate that both conditional DiTs and their latent variants achieve minimax optimality for unconditional DiTs. By discretizing input domains into infinitesimal grids and performing term-by-term Taylor expansions on the conditional score function, we enable leveraging transformers' universal approximation capabilities through detailed piecewise constant approximations, resulting in tighter bounds. Extending our analysis to the latent setting under a linear latent subspace assumption, we show that latent conditional DiTs achieve lower bounds than their counterparts in both approximation and estimation. We also establish the minimax optimality of latent unconditional DiTs. Our findings provide statistical limits for conditional and unconditional DiTs and offer practical guidance for developing more efficient and accurate models.

Cite

Text

Hu et al. "Statistical Foundations of Conditional Diffusion Transformers." ICLR 2025 Workshops: DeLTa, 2025.

Markdown

[Hu et al. "Statistical Foundations of Conditional Diffusion Transformers." ICLR 2025 Workshops: DeLTa, 2025.](https://mlanthology.org/iclrw/2025/hu2025iclrw-statistical/)

BibTeX

@inproceedings{hu2025iclrw-statistical,
  title     = {{Statistical Foundations of Conditional Diffusion Transformers}},
  author    = {Hu, Jerry Yao-Chieh and Wu, Weimin and Lee, Yi-Chen and Huang, Yu-Chao and Chen, Minshuo and Liu, Han},
  booktitle = {ICLR 2025 Workshops: DeLTa},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/hu2025iclrw-statistical/}
}