SwiniPASSR: Swin Transformer Based Parallax Attention Network for Stereo Image Super-Resolution
Abstract
With binocular cameras being widely accepted, the study of stereo image super resolution (Stereo SR) has received increasing attention. Different from single image super resolution (SISR) setting, it is more challenging for utilizing both intra-view and cross-view information. Although prior convolution-based works have achieved admirable progress, few attempts have explored the possibility of the Transformer-based architecture for stereo image SR, which has demonstrated promising performance in several visual tasks. In this paper, we propose a novel approach namely SwiniPASSR, which adopts Swin Transformer as the backbone, meanwhile incorporating it with the Bi-directional Parallax Attention Module (biPAM) to maximize auxiliary information given by the binocular mechanism. Even Transformer and parallax attention mechanism (PAM) have been separately proved usefulness by prior studies, we find that simply integrating convolution-based PAM with Transformer or directly optimizing for stereo SR problem was may not achieve desirable result. We therefore introduced a conversion layer to resolve integration and adopted progressive training strategy to learn disparity correspondence through progressively enlarged receptive fields. Both extensive experiments and ablation studies demonstrate the effectiveness of our proposed SwiniPASSR. In particular, in the NTIRE 2022: Stereo Image Super-Resolution Challenge, we report 23.71dB PSNR and 0.7295 SSIM performance which ranked 2nd place on the leaderboard. Source code is available at https://gj.thub.com/SMI-Lab/SwinlPASSR.
Cite
Text
Jin et al. "SwiniPASSR: Swin Transformer Based Parallax Attention Network for Stereo Image Super-Resolution." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00106Markdown
[Jin et al. "SwiniPASSR: Swin Transformer Based Parallax Attention Network for Stereo Image Super-Resolution." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/jin2022cvprw-swinipassr/) doi:10.1109/CVPRW56347.2022.00106BibTeX
@inproceedings{jin2022cvprw-swinipassr,
title = {{SwiniPASSR: Swin Transformer Based Parallax Attention Network for Stereo Image Super-Resolution}},
author = {Jin, Kai and Wei, Zeqiang and Yang, Angulia and Guo, Sha and Gao, Mingzhi and Zhou, Xiuzhuang and Guo, Guodong},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2022},
pages = {919-928},
doi = {10.1109/CVPRW56347.2022.00106},
url = {https://mlanthology.org/cvprw/2022/jin2022cvprw-swinipassr/}
}