Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
Abstract
Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy. While off-the-shelf billion-scale datasets for image generation are available, collecting similar video data of the same scale is still challenging. Also, training a video diffusion model is computationally much more expensive than its image counterpart. In this work, we explore finetuning a pretrained image diffusion model with video data as a practical solution for the video synthesis task. We find that naively extending the image noise prior to video noise prior in video diffusion leads to sub-optimal performance. Our carefully designed video noise prior leads to substantially better performance. Extensive experimental validation shows that our model, Preserve Your Own COrrelation (PYoCo), attains SOTA zero-shot text-to-video results on the UCF-101 and MSR-VTT benchmarks. It also achieves SOTA video generation quality on the small-scale UCF-101 benchmark with a 10x smaller model using significantly less computation than the prior art.
Cite
Text
Ge et al. "Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.02096Markdown
[Ge et al. "Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/ge2023iccv-preserve/) doi:10.1109/ICCV51070.2023.02096BibTeX
@inproceedings{ge2023iccv-preserve,
title = {{Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models}},
author = {Ge, Songwei and Nah, Seungjun and Liu, Guilin and Poon, Tyler and Tao, Andrew and Catanzaro, Bryan and Jacobs, David and Huang, Jia-Bin and Liu, Ming-Yu and Balaji, Yogesh},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {22930-22941},
doi = {10.1109/ICCV51070.2023.02096},
url = {https://mlanthology.org/iccv/2023/ge2023iccv-preserve/}
}