TurboVSR: Fantastic Video Upscalers and Where to Find Them
Abstract
Diffusion-based generative models have demonstrated exceptional promise in the video super-resolution (VSR) task, achieving a substantial advancement in detail generation relative to prior methods. However, these approaches face significant computational efficiency challenges. For instance, current techniques may require tens of minutes to super-resolve a mere 2-second, 1080p video. In this paper, we present TurboVSR, an ultra-efficient diffusion-based video super-resolution model. Our core design comprises three key aspects: (1) We employ an autoencoder with a high compression ratio of 32x32x8 to reduce the number of tokens. (2) Highly compressed latents pose substantial challenges for training. We introduce factorized conditioning to mitigate the learning complexity: we first learn to super-resolve the initial frame; subsequently, we condition the super-resolution of the remaining frames on the high-resolution initial frame and the low-resolution subsequent frames. (3) We convert the pre-trained diffusion model to a shortcut model to enable fewer sampling steps, further accelerating inference. As a result, TurboVSR performs on par with state-of-the-art VSR methods, while being 100+ times faster, taking only 7 seconds to process a 2-second long 1080p video. TurboVSR also supports image resolution by considering image as a one-frame video. Our efficient design makes SR beyond 1080p possible, results on 4K (3648x2048) image SR show surprising fine details.
Cite
Text
Wang et al. "TurboVSR: Fantastic Video Upscalers and Where to Find Them." International Conference on Computer Vision, 2025.Markdown
[Wang et al. "TurboVSR: Fantastic Video Upscalers and Where to Find Them." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/wang2025iccv-turbovsr/)BibTeX
@inproceedings{wang2025iccv-turbovsr,
title = {{TurboVSR: Fantastic Video Upscalers and Where to Find Them}},
author = {Wang, Zhongdao and Zhao, Guodongfang and Ren, Jingjing and Feng, Bailan and Zhang, Shifeng and Li, Wenbo},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {18132-18142},
url = {https://mlanthology.org/iccv/2025/wang2025iccv-turbovsr/}
}