Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers
Abstract
Vision Transformers (ViTs) excel in computer vision tasks due to their ability to capture global context among tokens. However, their quadratic complexity $\mathcal{O}(N^2D)$ in terms of token number $N$ and feature dimension $D$ limits practical use on mobile devices, necessitating more mobile-friendly ViTs with reduced latency. Multi-head linear-attention is emerging as a promising alternative with linear complexity $\mathcal{O}(NDd)$, where $d$ is the per-head dimension. Still, more compute is needed as $d$ gets large for model accuracy. Reducing $d$ improves mobile friendliness at the expense of excessive small heads weak at learning valuable subspaces, ultimately impeding model capability. To overcome this efficiency-capability dilemma, we propose a novel Mobile-Attention design with a head-competition mechanism empowered by information flow, which prevents overemphasis on less important subspaces upon trivial heads while preserving essential subspaces to ensure Transformer’s capability. It enables linear-time complexity on mobile devices by supporting a small per-head dimension $d$ for mobile efficiency. By replacing the standard attention of ViTs with Mobile-Attention, our optimized ViTs achieved enhanced model capacity and competitive performance in a range of computer vision tasks. Specifically, we have achieved remarkable reductions in latency on the iPhone 12. Code is available at https://github.com/thuml/MobileAttention.
Cite
Text
Yao et al. "Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers." International Conference on Machine Learning, 2024.Markdown
[Yao et al. "Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/yao2024icml-mobile/)BibTeX
@inproceedings{yao2024icml-mobile,
title = {{Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers}},
author = {Yao, Zhiyu and Wang, Jian and Wu, Haixu and Wang, Jingdong and Long, Mingsheng},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {56914-56926},
volume = {235},
url = {https://mlanthology.org/icml/2024/yao2024icml-mobile/}
}