Saliency-Guided Transformer Network Combined with Local Embedding for No-Reference Image Quality Assessment

Abstract

No-Reference Image Quality Assessment (NR-IQA) methods based on Vision Transformer have recently drawn much attention for their superior performance. Unfortunately, being a crude combination of NR-IQA and Transformer, they can hardly take the advantage of their strengths. In this paper, we propose a novel Saliency-Guided Transformer Network combined with Local Embedding (TranSLA) for No-Reference Image Quality Assessment. Our TranSLA integrates different-level information for a robust representation. Existed researches have shown that the human vision system concentrates more on the Region-of-interest (RoI) when assessing the image quality. Thus we combine saliency prediction with Transformer to guide the model highlight the RoI when aggregating the global information. Besides, we import local embedding for Transformer with gradient map. Since the gradient map focuses on extracting structured feature in detail, it can be used as a supplement to offer local information for Transformer. Then, the local and non-local information can be utilized. Moreover, to accelerate the aggregation of information from all tokens, we introduce a Boosting Interaction Module (BIM) to enhance feature aggregation. BIM forces patch tokens to interact better with class tokens at all levels. Experiments on two large-scale NR-IQA benchmarks demonstrate that our method significantly outperforms the state-of-the-art.

Cite

Text

Zhu et al. "Saliency-Guided Transformer Network Combined with Local Embedding for No-Reference Image Quality Assessment." IEEE/CVF International Conference on Computer Vision Workshops, 2021. doi:10.1109/ICCVW54120.2021.00222

Markdown

[Zhu et al. "Saliency-Guided Transformer Network Combined with Local Embedding for No-Reference Image Quality Assessment." IEEE/CVF International Conference on Computer Vision Workshops, 2021.](https://mlanthology.org/iccvw/2021/zhu2021iccvw-saliencyguided/) doi:10.1109/ICCVW54120.2021.00222

BibTeX

@inproceedings{zhu2021iccvw-saliencyguided,
  title     = {{Saliency-Guided Transformer Network Combined with Local Embedding for No-Reference Image Quality Assessment}},
  author    = {Zhu, Mengmeng and Hou, Guanqun and Chen, Xinjia and Xie, Jiaxing and Lu, Haixian and Che, Jun},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2021},
  pages     = {1953-1962},
  doi       = {10.1109/ICCVW54120.2021.00222},
  url       = {https://mlanthology.org/iccvw/2021/zhu2021iccvw-saliencyguided/}
}