Learned Image Compression with Mixed Transformer-CNN Architectures

CVPR 2023 pp. 14388-14397

doi:10.1109/CVPR52729.2023.01383 /cvpr/2023/liu2023cvpr-learned/

Abstract

Learned image compression (LIC) methods have exhibited promising progress and superior rate-distortion performance compared with classical image compression standards. Most existing LIC methods are Convolutional Neural Networks-based (CNN-based) or Transformer-based, which have different advantages. Exploiting both advantages is a point worth exploring, which has two challenges: 1) how to effectively fuse the two methods? 2) how to achieve higher performance with a suitable complexity? In this paper, we propose an efficient parallel Transformer-CNN Mixture (TCM) block with a controllable complexity to incorporate the local modeling ability of CNN and the non-local modeling ability of transformers to improve the overall architecture of image compression models. Besides, inspired by the recent progress of entropy estimation models and attention modules, we propose a channel-wise entropy model with parameter-efficient swin-transformer-based attention (SWAtten) modules by using channel squeezing. Experimental results demonstrate our proposed method achieves state-of-the-art rate-distortion performances on three different resolution datasets (i.e., Kodak, Tecnick, CLIC Professional Validation) compared to existing LIC methods. The code is at https://github.com/jmliu206/LIC_TCM.

PDF CVPR Semantic Scholar

Cite

Text

Liu et al. "Learned Image Compression with Mixed Transformer-CNN Architectures." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01383

Markdown

[Liu et al. "Learned Image Compression with Mixed Transformer-CNN Architectures." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/liu2023cvpr-learned/) doi:10.1109/CVPR52729.2023.01383

BibTeX

@inproceedings{liu2023cvpr-learned,
  title     = {{Learned Image Compression with Mixed Transformer-CNN Architectures}},
  author    = {Liu, Jinming and Sun, Heming and Katto, Jiro},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {14388-14397},
  doi       = {10.1109/CVPR52729.2023.01383},
  url       = {https://mlanthology.org/cvpr/2023/liu2023cvpr-learned/}
}