Phased Consistency Models

Abstract

Consistency Models (CMs) have made significant progress in accelerating the generation of diffusion models. However, their application to high-resolution, text-conditioned image generation in the latent space remains unsatisfactory. In this paper, we identify three key flaws in the current design of Latent Consistency Models~(LCMs). We investigate the reasons behind these limitations and propose Phased Consistency Models (PCMs), which generalize the design space and address the identified limitations. Our evaluations demonstrate that PCMs outperform LCMs across 1--16 step generation settings. While PCMs are specifically designed for multi-step refinement, they achieve comparable 1-step generation results to previously state-of-the-art specifically designed 1-step methods. Furthermore, we show the methodology of PCMs is versatile and applicable to video generation, enabling us to train the state-of-the-art few-step text-to-video generator. Our code is available at https://github.com/G-U-N/Phased-Consistency-Model.

Cite

Text

Wang et al. "Phased Consistency Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-2668

Markdown

[Wang et al. "Phased Consistency Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wang2024neurips-phased/) doi:10.52202/079017-2668

BibTeX

@inproceedings{wang2024neurips-phased,
  title     = {{Phased Consistency Models}},
  author    = {Wang, Fu-Yun and Huang, Zhaoyang and Bergman, Alexander William and Shen, Dazhong and Gao, Peng and Lingelbach, Michael and Sun, Keqiang and Bian, Weikang and Song, Guanglu and Liu, Yu and Wang, Xiaogang and Li, Hongsheng},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2668},
  url       = {https://mlanthology.org/neurips/2024/wang2024neurips-phased/}
}