Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

Abstract

Recent advances in generative modeling have positioned diffusion models as state-of-the-art tools for sampling from complex data distributions. While these models have shown remarkable success across single-modality domains such as images and audio, extending their capabilities to *Modality Translation (MT)*, translating information across different sensory modalities, remains an open challenge. Existing approaches often rely on restrictive assumptions, including shared dimensionality, Gaussian source priors, and modality-specific architectures, which limit their generality and theoretical grounding. In this work, we propose the Latent Denoising Diffusion Bridge Model (LDDBM), a general-purpose framework for modality translation based on a latent-variable extension of Denoising Diffusion Bridge Models. By operating in a shared latent space, our method learns a bridge between arbitrary modalities without requiring aligned dimensions. We introduce a contrastive alignment loss to enforce semantic consistency between paired samples and design a domain-agnostic encoder-decoder architecture tailored for noise prediction in latent space. Additionally, we propose a predictive loss to guide training toward accurate cross-domain translation and explore several training strategies to improve stability. Our approach supports arbitrary modality pairs and performs strongly on diverse MT tasks, including multi-view to 3D shape generation, image super-resolution, and multi-view scene synthesis. Comprehensive experiments and ablations validate the effectiveness of our framework, establishing a new strong baseline in general modality translation. For more information, see our project page: https://sites.google.com/view/lddbm/home.

Cite

Text

Berman et al. "Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge." Advances in Neural Information Processing Systems, 2025.

Markdown

[Berman et al. "Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/berman2025neurips-general/)

BibTeX

@inproceedings{berman2025neurips-general,
  title     = {{Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge}},
  author    = {Berman, Nimrod and Joglekar, Omkar and Kosman, Eitan and Di Castro, Dotan and Azencot, Omri},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/berman2025neurips-general/}
}