VCT: Training Consistency Models with Variational Noise Coupling

Abstract

Consistency Training (CT) has recently emerged as a strong alternative to diffusion models for image generation. However, non-distillation CT often suffers from high variance and instability, motivating ongoing research into its training dynamics. We propose Variational Consistency Training (VCT), a flexible and effective framework compatible with various forward kernels, including those in flow matching. Its key innovation is a learned noise-data coupling scheme inspired by Variational Autoencoders, where a data-dependent encoder models noise emission. This enables VCT to adaptively learn noise-to-data pairings, reducing training variance relative to the fixed, unsorted pairings in classical CT. Experiments on multiple image datasets demonstrate significant improvements: our method surpasses baselines, achieves state-of-the-art FID among non-distillation CT approaches on CIFAR-10, and matches SoTA performance on ImageNet 64x64 with only two sampling steps. Code is available at https://github.com/sony/vct.

Cite

Text

Silvestri et al. "VCT: Training Consistency Models with Variational Noise Coupling." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Silvestri et al. "VCT: Training Consistency Models with Variational Noise Coupling." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/silvestri2025icml-vct/)

BibTeX

@inproceedings{silvestri2025icml-vct,
  title     = {{VCT: Training Consistency Models with Variational Noise Coupling}},
  author    = {Silvestri, Gianluigi and Ambrogioni, Luca and Lai, Chieh-Hsin and Takida, Yuhta and Mitsufuji, Yuki},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {55657-55683},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/silvestri2025icml-vct/}
}