SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles

Ran Xu, Fangzhou Mu, Jayoung Lee, Preeti Mukherjee, Somali Chaterji, Saurabh Bagchi, Yin Li

CVPR 2022 pp. 2528-2538

doi:10.1109/CVPR52688.2022.00256 /cvpr/2022/xu2022cvpr-smartadapt/

Abstract

Several recent works seek to create lightweight deep networks for video object detection on mobiles. We observe that many existing detectors, previously deemed computationally costly for mobiles, intrinsically support adaptive inference, and offer a multi-branch object detection framework (MBODF). Here, an MBODF is referred to as a solution that has many execution branches and one can dynamically choose from among them at inference time to satisfy varying latency requirements (e.g. by varying resolution of an input frame). In this paper, we ask, and answer, the wide-ranging question across all MBODFs: How to expose the right set of execution branches and then how to schedule the optimal one at inference time? In addition, we uncover the importance of making a content-aware decision on which branch to run, as the optimal one is conditioned on the video content. Finally, we explore a content-aware scheduler, an Oracle one, and then a practical one, leveraging various lightweight feature extractors. Our evaluation shows that layered on Faster R-CNN-based MBODF, compared to 7 baselines, our SMARTADAPT achieves a higher Pareto optimal curve in the accuracy-vs-latency space for the ILSVRC VID dataset.

PDF CVPR Semantic Scholar

Cite

Text

Xu et al. "SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00256

Markdown

[Xu et al. "SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/xu2022cvpr-smartadapt/) doi:10.1109/CVPR52688.2022.00256

BibTeX

@inproceedings{xu2022cvpr-smartadapt,
  title     = {{SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles}},
  author    = {Xu, Ran and Mu, Fangzhou and Lee, Jayoung and Mukherjee, Preeti and Chaterji, Somali and Bagchi, Saurabh and Li, Yin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {2528-2538},
  doi       = {10.1109/CVPR52688.2022.00256},
  url       = {https://mlanthology.org/cvpr/2022/xu2022cvpr-smartadapt/}
}