Reconciling Visual Perception and Generation in Diffusion Models

Li, Liulei; Yang, Yi; Wang, Wenguan

Reconciling Visual Perception and Generation in Diffusion Models

ICLR 2026

/iclr/2026/li2026iclr-reconciling/

Abstract

We present \textsc{GenRep}, a unified image understanding and synthesis model that jointly conducts discriminative learning and generative modeling in one training session. By leveraging Monte Carlo approximation, \textsc{GenRep} distills distributional knowledge embedded in diffusion models to guide the discriminative learning for visual perception tasks. Simultaneously, a semantic-driven image generation process is established, where high-level semantics learned from perception tasks can be used to inform image synthesis, creating a positive feedback loop for mutual boosts. Moreover, to reconcile the learning process for both tasks, a gradient alignment strategy is proposed to symmetrically modify the optimization directions of perception and generation losses. These designs empower \textsc{GenRep} to be a versatile and powerful model that achieves top-leading performance on both image understanding and generation benchmarks.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Li et al. "Reconciling Visual Perception and Generation in Diffusion Models." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "Reconciling Visual Perception and Generation in Diffusion Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-reconciling/)

BibTeX

@inproceedings{li2026iclr-reconciling,
  title     = {{Reconciling Visual Perception and Generation in Diffusion Models}},
  author    = {Li, Liulei and Yang, Yi and Wang, Wenguan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-reconciling/}
}