FVD: A New Metric for Video Generation

Abstract

Recent advances in deep generative models have lead to remarkable progress in synthesizing high quality images. Following their successful application in image processing and representation learning, an important next step is to consider videos. Learning generative models of video is a much harder task, requiring a model to capture the temporal dynamics of a scene, in addition to the visual presentation of objects. While recent generative models of video have had some success, current progress is hampered by the lack of qualitative metrics that consider visual quality, temporal coherence, and diversity of samples. To this extent we propose Fréchet Video Distance (FVD), a new metric for generative models of video based on FID. We contribute a large-scale human study, which confirms that FVD correlates well with qualitative human judgment of generated videos.

Cite

Text

Unterthiner et al. "FVD: A New Metric for Video Generation." ICLR 2019 Workshops: DeepGenStruct, 2019.

Markdown

[Unterthiner et al. "FVD: A New Metric for Video Generation." ICLR 2019 Workshops: DeepGenStruct, 2019.](https://mlanthology.org/iclrw/2019/unterthiner2019iclrw-fvd/)

BibTeX

@inproceedings{unterthiner2019iclrw-fvd,
  title     = {{FVD: A New Metric for Video Generation}},
  author    = {Unterthiner, Thomas and van Steenkiste, Sjoerd and Kurach, Karol and Marinier, Raphaël and Michalski, Marcin and Gelly, Sylvain},
  booktitle = {ICLR 2019 Workshops: DeepGenStruct},
  year      = {2019},
  url       = {https://mlanthology.org/iclrw/2019/unterthiner2019iclrw-fvd/}
}