Scaling Graph Convolutions for Mobile Vision

Abstract

To compete with existing mobile architectures, Mobile-ViG introduces Sparse Vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling problem. Our proposed mobile vision architecture, Mobile-ViGv2, uses MGC to demonstrate the effectiveness of our approach. MGC improves on SVGA by increasing graph sparsity and introducing conditional positional encodings to the graph operation. Our smallest model, MobileViGv2-Ti, achieves a 77.7% top-1 accuracy on ImageNet-1K, 2% higher than MobileViG-Ti, with 0.9 ms inference latency on the iPhone 13 Mini NPU. Our largest model, MobileViGv2-B, achieves an 83.4% top-1 accuracy, 0.8% higher than MobileViG-B, with 2.7 ms inference latency. Besides image classification, we show that MobileViGv2 generalizes well to other tasks. For object detection and instance segmentation on MS COCO 2017, MobileViGv2-M outperforms MobileViG-M by 1.2 AP box and 0.7 AP mask, and MobileViGv2-B outperforms MobileViG-B by 1.0 AP box and 0.7 AP mask. For semantic segmentation on ADE20K, MobileViGv2-M achieves 42.9% mIoU and MobileViGv2-B achieves 44.3% mIoU 1.

Cite

Text

Avery et al. "Scaling Graph Convolutions for Mobile Vision." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00593

Markdown

[Avery et al. "Scaling Graph Convolutions for Mobile Vision." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/avery2024cvprw-scaling/) doi:10.1109/CVPRW63382.2024.00593

BibTeX

@inproceedings{avery2024cvprw-scaling,
  title     = {{Scaling Graph Convolutions for Mobile Vision}},
  author    = {Avery, William and Munir, Mustafa and Marculescu, Radu},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {5857-5865},
  doi       = {10.1109/CVPRW63382.2024.00593},
  url       = {https://mlanthology.org/cvprw/2024/avery2024cvprw-scaling/}
}