MAMBA: Multi-Level Aggregation via Memory Bank for Video Object Detection

Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

AAAI 2021 pp. 2620-2627

doi:10.1609/AAAI.V35I3.16365 /aaai/2021/sun2021aaai-mamba/

Abstract

State-of-the-art video object detection methods maintain a memory structure, either a sliding window or a memory queue, to enhance the current frame using attention mechanisms. However, we argue that these memory structures are not efficient or sufficient because of two implied operations: (1) concatenating all features in memory for enhancement, leading to a heavy computational cost; (2) frame-wise memory updating, preventing the memory from capturing more temporal information. In this paper, we propose a multi-level aggregation architecture via memory bank called MAMBA. Specifically, our memory bank employs two novel operations to eliminate disadvantages of existing methods: (1) light-weight key-set construction which can significantly reduce the computational cost; (2) fine-grained feature-wise updating strategy which enables our method to utilize knowledge from the whole video. To better enhance features from complementary levels, i.e., feature maps and proposals, we further propose a generalized enhancement operation (GEO) to aggregate multi-level features in a unified manner. We conduct extensive evaluations on the challenging ImageNetVID dataset. Compared with existing state-of-the-art methods, our method achieves superior performance in terms of both speed and accuracy. More remarkably, MAMBA achieves mAP of 83.7%/84.6% at 12.6/9.1 FPS with ResNet-101.

PDF AAAI Semantic Scholar

Cite

Text

Sun et al. "MAMBA: Multi-Level Aggregation via Memory Bank for Video Object Detection." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I3.16365

Markdown

[Sun et al. "MAMBA: Multi-Level Aggregation via Memory Bank for Video Object Detection." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/sun2021aaai-mamba/) doi:10.1609/AAAI.V35I3.16365

BibTeX

@inproceedings{sun2021aaai-mamba,
  title     = {{MAMBA: Multi-Level Aggregation via Memory Bank for Video Object Detection}},
  author    = {Sun, Guanxiong and Hua, Yang and Hu, Guosheng and Robertson, Neil},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {2620-2627},
  doi       = {10.1609/AAAI.V35I3.16365},
  url       = {https://mlanthology.org/aaai/2021/sun2021aaai-mamba/}
}