Towards Smooth Video Composition

Abstract

Video generation, with the purpose of producing a sequence of frames, requires synthesizing consistent and persistent dynamic contents over time. This work investigates how to model the temporal relations for composing a video with arbitrary number of frames, from a few to even infinite, using generative adversarial networks (GANs). First, towards composing adjacent frames, we show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, bring a smooth frame transition without harming the per-frame quality. Second, through incorporating a temporal shift module (TSM), which is originally designed for video understanding, into the discriminator, we manage to advance the generator in synthesizing more reasonable dynamics. Third, we develop a novel B-Spline based motion representation to ensure the temporal smoothness, and hence achieve infinite-length video generation, going beyond the frame number used in training. We evaluate our approach on a range of datasets and show substantial improvements over baselines on video generation. Code and models are publicly available at \url{https://genforce.github.io/StyleSV}.

Cite

Text

Zhang et al. "Towards Smooth Video Composition." International Conference on Learning Representations, 2023.

Markdown

[Zhang et al. "Towards Smooth Video Composition." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/zhang2023iclr-smooth/)

BibTeX

@inproceedings{zhang2023iclr-smooth,
  title     = {{Towards Smooth Video Composition}},
  author    = {Zhang, Qihang and Yang, Ceyuan and Shen, Yujun and Xu, Yinghao and Zhou, Bolei},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/zhang2023iclr-smooth/}
}