VCT: Training Consistency Models with Variational Noise Coupling
Abstract
Consistency Training (CT) has recently emerged as a strong alternative to diffusion models for image generation. However, non-distillation CT often suffers from high variance and instability, motivating ongoing research into its training dynamics. We propose Variational Consistency Training (VCT), a flexible and effective framework compatible with various forward kernels, including those in flow matching. Its key innovation is a learned noise-data coupling scheme inspired by Variational Autoencoders, where a data-dependent encoder models noise emission. This enables VCT to adaptively learn noise-to-data pairings, reducing training variance relative to the fixed, unsorted pairings in classical CT. Experiments on multiple image datasets demonstrate significant improvements: our method surpasses baselines, achieves state-of-the-art FID among non-distillation CT approaches on CIFAR-10, and matches SoTA performance on ImageNet 64x64 with only two sampling steps. Code is available at https://github.com/sony/vct.
Cite
Text
Silvestri et al. "VCT: Training Consistency Models with Variational Noise Coupling." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Silvestri et al. "VCT: Training Consistency Models with Variational Noise Coupling." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/silvestri2025icml-vct/)BibTeX
@inproceedings{silvestri2025icml-vct,
title = {{VCT: Training Consistency Models with Variational Noise Coupling}},
author = {Silvestri, Gianluigi and Ambrogioni, Luca and Lai, Chieh-Hsin and Takida, Yuhta and Mitsufuji, Yuki},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {55657-55683},
volume = {267},
url = {https://mlanthology.org/icml/2025/silvestri2025icml-vct/}
}