Emulating Self-Attention with Convolution for Efficient Image Super-Resolution
Abstract
In this paper, we tackle the high computational overhead of Transformers for efficient image super-resolution (SR). Motivated by the observations of self-attention's inter-layer repetition, we introduce a convolutionized self-attention module named Convolutional Attention (ConvAttn) that emulates self-attention's long-range modeling capability and instance-dependent weighting with a single shared large kernel and dynamic kernels. By utilizing the ConvAttn module, we significantly reduce the reliance on self-attention and its involved memory-bound operations while maintaining the representational capability of Transformers. Furthermore, we overcome the challenge of integrating flash attention into the lightweight SR regime, effectively mitigating self-attention's inherent memory bottleneck. We scale up the window size to 32x32 with flash attention rather than proposing an intricate self-attention module, significantly improving PSNR by 0.31dB on Urban100x2 while reducing latency and memory usage by 16xand 12.2x. Building on these approaches, our proposed network, termed Emulating Self-attention with Convolution (ESC), notably improves PSNR by 0.27 dB on Urban100x4 compared to HiT-SRF, reducing the latency and memory usage by 3.7xand 6.2x, respectively. Extensive experiments demonstrate that our ESC maintains the ability for long-range modeling, data scalability, and the representational power of Transformers despite most self-attention being replaced by the ConvAttn module.
Cite
Text
Lee et al. "Emulating Self-Attention with Convolution for Efficient Image Super-Resolution." International Conference on Computer Vision, 2025.Markdown
[Lee et al. "Emulating Self-Attention with Convolution for Efficient Image Super-Resolution." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/lee2025iccv-emulating/)BibTeX
@inproceedings{lee2025iccv-emulating,
title = {{Emulating Self-Attention with Convolution for Efficient Image Super-Resolution}},
author = {Lee, Dongheon and Yun, Seokju and Ro, Youngmin},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {24467-24477},
url = {https://mlanthology.org/iccv/2025/lee2025iccv-emulating/}
}