MobileNetV4: Universal Models for the Mobile Ecosystem
Abstract
We present the latest generation of MobileNets: MobileNetV4 (MNv4). They feature universally-efficient architecture designs for mobile devices. We introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as accelerators like Apple Neural Engine and Google Pixel EdgeTPU. This performance uniformity is not found in any other models tested. We introduce performance modeling and analysis techniques to explain how this performance is achieved. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of 3.8ms.
Cite
Text
Qin et al. "MobileNetV4: Universal Models for the Mobile Ecosystem." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73661-2_5Markdown
[Qin et al. "MobileNetV4: Universal Models for the Mobile Ecosystem." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/qin2024eccv-mobilenetv4/) doi:10.1007/978-3-031-73661-2_5BibTeX
@inproceedings{qin2024eccv-mobilenetv4,
title = {{MobileNetV4: Universal Models for the Mobile Ecosystem}},
author = {Qin, Danfeng and Leichner, Chas H and Delakis, Manolis and Fornoni, Marco and Luo, Shixin and Yang, Fan and Wang, Weijun and Banbury, Colby and Ye, Chengxi and Akin, Berkin and Aggarwal, Vaibhav and Zhu, Tenghui and Moro, Daniele and Howard, Andrew},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73661-2_5},
url = {https://mlanthology.org/eccv/2024/qin2024eccv-mobilenetv4/}
}