MUSIQ: Multi-Scale Image Quality Transformer

Abstract

Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ and KonIQ-10k.

Cite

Text

Ke et al. "MUSIQ: Multi-Scale Image Quality Transformer." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00510

Markdown

[Ke et al. "MUSIQ: Multi-Scale Image Quality Transformer." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/ke2021iccv-musiq/) doi:10.1109/ICCV48922.2021.00510

BibTeX

@inproceedings{ke2021iccv-musiq,
  title     = {{MUSIQ: Multi-Scale Image Quality Transformer}},
  author    = {Ke, Junjie and Wang, Qifei and Wang, Yilin and Milanfar, Peyman and Yang, Feng},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {5148-5157},
  doi       = {10.1109/ICCV48922.2021.00510},
  url       = {https://mlanthology.org/iccv/2021/ke2021iccv-musiq/}
}