Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions

Abstract

We present Phenaki, a model capable of realistic video synthesis given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new causal model for learning video representation which compresses the video to a small discrete tokens representation. This tokenizer is auto-regressive in time, which allows it to work with video representations of different length. To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, we demonstrate how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to the previous video generation methods, Phenaki can generate arbitrary long videos conditioned on a sequence of prompts (i.e. time variable text or story) in open domain. To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts.

Cite

Text

Villegas et al. "Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions." International Conference on Learning Representations, 2023.

Markdown

[Villegas et al. "Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/villegas2023iclr-phenaki/)

BibTeX

@inproceedings{villegas2023iclr-phenaki,
  title     = {{Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions}},
  author    = {Villegas, Ruben and Babaeizadeh, Mohammad and Kindermans, Pieter-Jan and Moraldo, Hernan and Zhang, Han and Saffar, Mohammad Taghi and Castro, Santiago and Kunze, Julius and Erhan, Dumitru},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/villegas2023iclr-phenaki/}
}