Short-Form UGC Video Quality Assessment Based on Multi-Level Video Fusion with Rank-Aware

Abstract

Short-form UGC video platforms, such as Kwai and TikTok, have ushered in vigorous development. However, due to the variety of short video types and uneven quality, the workload of manual annotation is heavy. In this paper, video is decomposed into three levels (frame level, segment level, and video level) based on the view of data augmentation and multi-level fusion, and a new integrated framework is proposed to capture the spatial-temporal characteristics and relative rank information of different levels. It uses spatial-temporal data augmentation strategy, multi-level feature fusion, adaptive rank-aware loss, and redistributed model ensemble at all levels. These components allow our method not only to capture features at each level but also to mitigate the difficulty of identifying the relative rank of the two kinds of hard samples. Our framework achieves 5th place among all methods in the NTIRE 2024 Short-form UGC Video Quality Assessment Challenge. A large number of experiments show that our framework not only performs well on the KVQ dataset but also on other benchmark VQA datasets. It proves the generalization and superiority of our framework.

Cite

Text

Xu et al. "Short-Form UGC Video Quality Assessment Based on Multi-Level Video Fusion with Rank-Aware." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00633

Markdown

[Xu et al. "Short-Form UGC Video Quality Assessment Based on Multi-Level Video Fusion with Rank-Aware." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/xu2024cvprw-shortform/) doi:10.1109/CVPRW63382.2024.00633

BibTeX

@inproceedings{xu2024cvprw-shortform,
  title     = {{Short-Form UGC Video Quality Assessment Based on Multi-Level Video Fusion with Rank-Aware}},
  author    = {Xu, Haoran and Yang, Mengduo and Zhou, Jie and Li, Jiaze},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {6297-6306},
  doi       = {10.1109/CVPRW63382.2024.00633},
  url       = {https://mlanthology.org/cvprw/2024/xu2024cvprw-shortform/}
}