FIFO-Diffusion: Generating Infinite Videos from Text Without Training
Abstract
We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without additional training. This is achieved by iteratively performing diagonal denoising, which simultaneously processes a series of consecutive frames with increasing noise levels in a queue; our method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising is a double-edged sword as the frames near the tail can take advantage of cleaner frames by forward reference but such a strategy induces the discrepancy between training and inference. Hence, we introduce latent partitioning to reduce the training-inference gap and lookahead denoising to leverage the benefit of forward referencing. Practically, FIFO-Diffusion consumes a constant amount of memory regardless of the target video length given a baseline model, while well-suited for parallel inference on multiple GPUs. We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines. Generated video examples and source codes are available at our project page.
Cite
Text
Kim et al. "FIFO-Diffusion: Generating Infinite Videos from Text Without Training." Neural Information Processing Systems, 2024. doi:10.52202/079017-2853Markdown
[Kim et al. "FIFO-Diffusion: Generating Infinite Videos from Text Without Training." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/kim2024neurips-fifodiffusion/) doi:10.52202/079017-2853BibTeX
@inproceedings{kim2024neurips-fifodiffusion,
title = {{FIFO-Diffusion: Generating Infinite Videos from Text Without Training}},
author = {Kim, Jihwan and Kang, Junoh and Choi, Jinyoung and Han, Bohyung},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-2853},
url = {https://mlanthology.org/neurips/2024/kim2024neurips-fifodiffusion/}
}