Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment

CVPRW 2023 pp. 1302-1310

doi:10.1109/CVPRW59228.2023.00137 /cvprw/2023/zhao2023cvprw-zoomvqa/

Abstract

Video quality assessment (VQA) aims to simulate the human perception of video quality, which is influenced by factors ranging from low-level color and texture details to high-level semantic content. To effectively model these complicated quality-related factors, in this paper, we decompose video into three levels (i.e., patch level, frame level, and clip level), and propose a novel Zoom-VQA architecture to perceive spatio-temporal features at different levels. It integrates three components: patch attention module, frame pyramid alignment, and clip ensemble strategy, respectively for capturing region-of-interest in the spatial dimension, multi-level information at different feature levels, and distortions distributed over the temporal dimension. Owing to the comprehensive design, Zoom-VQA obtains state-of-the-art results on four VQA benchmarks and achieves 2nd place in the NTIRE 2023 VQA challenge. Notably, Zoom-VQA has outperformed the previous best results on two subsets of LSVQ, achieving 0.8860 (+1.0%) and 0.7985 (+1.9%) of SRCC on the respective subsets. Adequate ablation studies further verify the effectiveness of each component. Codes and models are released in https://github.com/k-zha14/Zoom-VQA.

PDF CVPRW Semantic Scholar

Cite

Text

Zhao et al. "Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00137

Markdown

[Zhao et al. "Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/zhao2023cvprw-zoomvqa/) doi:10.1109/CVPRW59228.2023.00137

BibTeX

@inproceedings{zhao2023cvprw-zoomvqa,
  title     = {{Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment}},
  author    = {Zhao, Kai and Yuan, Kun and Sun, Ming and Wen, Xing},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {1302-1310},
  doi       = {10.1109/CVPRW59228.2023.00137},
  url       = {https://mlanthology.org/cvprw/2023/zhao2023cvprw-zoomvqa/}
}