VideoGigaGAN: Towards Detail-Rich Video Super-Resolution

Abstract

Video super-resolution (VSR) models achieve temporal consistency but often produce blurrier results than their image-based counterparts due to limited generative capacity. This prompts the question: can we adapt a generative image upsampler for VSR while preserving temporal consistency? We introduce VideoGigaGAN, a new generative VSR model that combines high-frequency detail with temporal stability, building on the large-scale GigaGAN image upsampler. Simple adaptations of GigaGAN for VSR led to flickering issues, so we propose techniques to enhance temporal consistency. We validate the effectiveness of VideoGigaGAN by comparing it with state-of-the-art VSR models on public datasets and showcasing video results with 8x upsampling.

Cite

Text

Xu et al. "VideoGigaGAN: Towards Detail-Rich Video Super-Resolution." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00205

Markdown

[Xu et al. "VideoGigaGAN: Towards Detail-Rich Video Super-Resolution." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/xu2025cvpr-videogigagan/) doi:10.1109/CVPR52734.2025.00205

BibTeX

@inproceedings{xu2025cvpr-videogigagan,
  title     = {{VideoGigaGAN: Towards Detail-Rich Video Super-Resolution}},
  author    = {Xu, Yiran and Park, Taesung and Zhang, Richard and Zhou, Yang and Shechtman, Eli and Liu, Feng and Huang, Jia-Bin and Liu, Difan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {2139-2149},
  doi       = {10.1109/CVPR52734.2025.00205},
  url       = {https://mlanthology.org/cvpr/2025/xu2025cvpr-videogigagan/}
}