Practical Stacked Non-Local Attention Modules for Image Compression

Abstract

In this paper, we proposed a stacked non-local attention based variational autoencoder (VAE) for learned image compression. We use a non-local module to capture global correlations effectively that can't be offered by traditional convolutional neural networks (CNNs). Meanwhile, layer-wise self-attention mechanisms are widely used to activate/preserve important and challenging regions. We jointly take the hyperpriors and autoregressive priors for conditional probability estimation. For practical application, we have implemented a sparse non-local processing via maxpooling to greatly reduce the memory consumption, and masked 3D convolutions to support parallel processing for autoregressive priors based probability prediction. A post-processing network is then concatenated and trained with decoder jointly for quality enhancement. We have evaluated our model using public CLIC2019 validation and test dataset, offering averaged 0.9753 and 0.9733 respectively when evaluated using multi-scale structural similarity (MS-SSIM) with bit rate less than 0.15 bits per pixel (bpp).

Cite

Text

Liu et al. "Practical Stacked Non-Local Attention Modules for Image Compression." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.

Markdown

[Liu et al. "Practical Stacked Non-Local Attention Modules for Image Compression." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.](https://mlanthology.org/cvprw/2019/liu2019cvprw-practical/)

BibTeX

@inproceedings{liu2019cvprw-practical,
  title     = {{Practical Stacked Non-Local Attention Modules for Image Compression}},
  author    = {Liu, Haojie and Chen, Tong and Shen, Qiu and Ma, Zhan},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2019},
  url       = {https://mlanthology.org/cvprw/2019/liu2019cvprw-practical/}
}