ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition
Abstract
Image super-resolution (ISR) is a classic and challenging problem in computer vision because of complex and unknown degradation patterns in the data collection process. Leveraging powerful generative priors, diffusion-based methods have recently established new state-of-the-art ISR performance, but their characteristics in the frequency domain are still underexplored. In this paper, we innovatively investigate their frequency-domain behaviors from a sampling timestep perspective. Experimentally, we find that current diffusion-based ISR algorithms exhibit insufficiency in different frequency components in distinct groups of timesteps during the sampling. To address this, we first propose a Timestep Division Controller that is able to adaptively divide the timesteps into groups based on the performance gradient across different components. Next, we design two dedicated modules --- the Amplitude and Phase Enhancement Module (APEM) and the High- and Low-Frequency Enhancement Module (HLEM), to regulate the information flow of distinct frequency-domain features. By adaptively enhancing specific frequency components at different stages of the sampling process, the two modules effectively compensate for the insufficient frequency-domain perception of diffusion-based ISR models. Extensive experiments on three benchmark datasets verify the superior ISR performance of our method, e.g., achieving an average 5.40% improvement on CLIP-IQA compared to the best diffusion-based ISR baseline.
Cite
Text
Xue et al. "ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/168Markdown
[Xue et al. "ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/xue2024ijcai-protopformer/) doi:10.24963/ijcai.2024/168BibTeX
@inproceedings{xue2024ijcai-protopformer,
title = {{ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition}},
author = {Xue, Mengqi and Huang, Qihan and Zhang, Haofei and Hu, Jingwen and Song, Jie and Song, Mingli and Jin, Canghong},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2024},
pages = {1516-1524},
doi = {10.24963/ijcai.2024/168},
url = {https://mlanthology.org/ijcai/2024/xue2024ijcai-protopformer/}
}