On the Content Bias in Frechet Video Distance

Abstract

Frechet Video Distance (FVD) a prominent metric for evaluating video generation models is known to conflict with human perception occasionally. In this paper we aim to explore the extent of FVD's bias toward frame quality over temporal realism and identify its sources. We first quantify the FVD's sensitivity to the temporal axis by decoupling the frame and motion quality and find that the FVD only increases slightly with larger temporal corruption. We then analyze the generated videos and show that via careful sampling from a large set of generated videos that do not contain motions one can drastically decrease FVD without improving the temporal quality. Both studies suggest FVD's basis towards the quality of individual frames. We show that FVD with features extracted from the recent large-scale self-supervised video models is less biased toward image quality. Finally we revisit a few real-world examples to validate our hypothesis.

Cite

Text

Ge et al. "On the Content Bias in Frechet Video Distance." Conference on Computer Vision and Pattern Recognition, 2024.

Markdown

[Ge et al. "On the Content Bias in Frechet Video Distance." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/ge2024cvpr-content/)

BibTeX

@inproceedings{ge2024cvpr-content,
  title     = {{On the Content Bias in Frechet Video Distance}},
  author    = {Ge, Songwei and Mahapatra, Aniruddha and Parmar, Gaurav and Zhu, Jun-Yan and Huang, Jia-Bin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {7277-7288},
  url       = {https://mlanthology.org/cvpr/2024/ge2024cvpr-content/}
}