Patronus: Interpretable Diffusion Models with Prototypes

Abstract

Uncovering the opacity of diffusion-based generative models is urgently needed, as their applications continue to expand while their underlying procedures largely remain a black box. With a critical question -- how can the diffusion generation process be interpreted and understood? -- we proposed *Patronus*, an interpretable diffusion model that incorporates a prototypical network to encode semantics in visual patches, revealing *what* visual patterns are learned and *where* and *when* they emerge throughout denoising. This interpretability of Patronus provides deeper insights into the generative mechanism, enabling the detection of shortcut learning via unwanted correlations and the tracing of semantic emergence across timesteps. We evaluate *Patronus* on four natural image datasets and one medical imaging dataset, demonstrating both faithful interpretability and strong generative performance. With this work, we open new avenues for understanding and steering diffusion models through prototype-based interpretability. Our code is available at [nina-weng.github.io/patronus.github.io](https://nina-weng.github.io/patronus.github.io/).

Cite

Text

Weng et al. "Patronus: Interpretable Diffusion Models with Prototypes." International Conference on Learning Representations, 2026.

Markdown

[Weng et al. "Patronus: Interpretable Diffusion Models with Prototypes." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/weng2026iclr-patronus/)

BibTeX

@inproceedings{weng2026iclr-patronus,
  title     = {{Patronus: Interpretable Diffusion Models with Prototypes}},
  author    = {Weng, Nina and Feragen, Aasa and Bigdeli, Siavash},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/weng2026iclr-patronus/}
}