Learned Video Compression with Feature-Level Residuals

Runsen Feng, Yaojun Wu, Zongyu Guo, Zhizheng Zhang, Zhibo Chen

CVPRW 2020 pp. 529-532

doi:10.1109/CVPRW50498.2020.00068 /cvprw/2020/feng2020cvprw-learned/

Abstract

In this paper, we present an end-to-end video compression network for P-frame challenge on CLIC. We focus on deep neural network (DNN) based video compression, and improve the current frameworks from three aspects. First, we notice that pixel space residuals is sensitive to the prediction errors of optical flow based motion compensation. To suppress the relative influence, we propose to compress the residuals of image feature rather than the residuals of image pixels. Furthermore, we combine the advantages of both pixel-level and feature-level residual compression methods by model ensembling. Finally, we propose a step-by-step training strategy to improve the training efficiency of the whole framework. Experiment results indicate that our proposed method achieves 0.9968 MS-SSIM on CLIC validation set and 0.9967 MS-SSIM on test set.

PDF CVPRW Semantic Scholar

Cite

Text

Feng et al. "Learned Video Compression with Feature-Level Residuals." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00068

Markdown

[Feng et al. "Learned Video Compression with Feature-Level Residuals." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/feng2020cvprw-learned/) doi:10.1109/CVPRW50498.2020.00068

BibTeX

@inproceedings{feng2020cvprw-learned,
  title     = {{Learned Video Compression with Feature-Level Residuals}},
  author    = {Feng, Runsen and Wu, Yaojun and Guo, Zongyu and Zhang, Zhizheng and Chen, Zhibo},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2020},
  pages     = {529-532},
  doi       = {10.1109/CVPRW50498.2020.00068},
  url       = {https://mlanthology.org/cvprw/2020/feng2020cvprw-learned/}
}