Scaling Graph Convolutions for Mobile Vision
Abstract
To compete with existing mobile architectures, Mobile-ViG introduces Sparse Vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling problem. Our proposed mobile vision architecture, Mobile-ViGv2, uses MGC to demonstrate the effectiveness of our approach. MGC improves on SVGA by increasing graph sparsity and introducing conditional positional encodings to the graph operation. Our smallest model, MobileViGv2-Ti, achieves a 77.7% top-1 accuracy on ImageNet-1K, 2% higher than MobileViG-Ti, with 0.9 ms inference latency on the iPhone 13 Mini NPU. Our largest model, MobileViGv2-B, achieves an 83.4% top-1 accuracy, 0.8% higher than MobileViG-B, with 2.7 ms inference latency. Besides image classification, we show that MobileViGv2 generalizes well to other tasks. For object detection and instance segmentation on MS COCO 2017, MobileViGv2-M outperforms MobileViG-M by 1.2 AP box and 0.7 AP mask, and MobileViGv2-B outperforms MobileViG-B by 1.0 AP box and 0.7 AP mask. For semantic segmentation on ADE20K, MobileViGv2-M achieves 42.9% mIoU and MobileViGv2-B achieves 44.3% mIoU 1.
Cite
Text
Avery et al. "Scaling Graph Convolutions for Mobile Vision." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00593Markdown
[Avery et al. "Scaling Graph Convolutions for Mobile Vision." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/avery2024cvprw-scaling/) doi:10.1109/CVPRW63382.2024.00593BibTeX
@inproceedings{avery2024cvprw-scaling,
title = {{Scaling Graph Convolutions for Mobile Vision}},
author = {Avery, William and Munir, Mustafa and Marculescu, Radu},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {5857-5865},
doi = {10.1109/CVPRW63382.2024.00593},
url = {https://mlanthology.org/cvprw/2024/avery2024cvprw-scaling/}
}