Mamba YOLO: A Simple Baseline for Object Detection with State Space Model

Wang, Zeyu; Li, Chen; Xu, Huiying; Zhu, Xinzhong; Li, Hongbo

doi:10.1609/AAAI.V39I8.32885

Mamba YOLO: A Simple Baseline for Object Detection with State Space Model

Zeyu Wang, Chen Li, Huiying Xu, Xinzhong Zhu, Hongbo Li

AAAI 2025 pp. 8205-8213

doi:10.1609/AAAI.V39I8.32885 /aaai/2025/wang2025aaai-mamba/

Abstract

Driven by the rapid development of deep learning technology, the YOLO series has set a new benchmark for real-time object detectors. Additionally, transformer-based structures have emerged as the most powerful solution in the field, greatly extending the model's receptive field and achieving significant performance improvements. However, this improvement comes at a cost, as the quadratic complexity of the self-attentive mechanism increases the computational burden of the model. To address this problem, we introduce a simple yet effective baseline approach called Mamba YOLO. Our contributions are as follows: 1) We propose that the ODMamba backbone introduce a State Space Model (SSM) with linear complexity to address the quadratic complexity of self-attention. Unlike the other Transformer-base and SSM-base method, ODMamba is simple to train without pretraining. 2) For real-time requirement, we designed the macro structure of ODMamba, determined the optimal stage ratio and scaling size. 3) We design the RG Block that employs a multi-branch structure to model the channel dimensions, which addresses the possible limitations of SSM in sequence modeling, such as insufficient receptive fields and weak image localization. This design captures localized image dependencies more accurately and significantly. Extensive experiments on the publicly available COCO benchmark dataset show that Mamba YOLO achieves state-of-the-art performance compared to previous methods. Specifically, a tiny version of Mamba YOLO achieves a 7.5% improvement in mAP on a single 4090 GPU with an inference time of 1.5 ms.

PDF AAAI Semantic Scholar

Cite

Text

Wang et al. "Mamba YOLO: A Simple Baseline for Object Detection with State Space Model." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I8.32885

Markdown

[Wang et al. "Mamba YOLO: A Simple Baseline for Object Detection with State Space Model." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wang2025aaai-mamba/) doi:10.1609/AAAI.V39I8.32885

BibTeX

@inproceedings{wang2025aaai-mamba,
  title     = {{Mamba YOLO: A Simple Baseline for Object Detection with State Space Model}},
  author    = {Wang, Zeyu and Li, Chen and Xu, Huiying and Zhu, Xinzhong and Li, Hongbo},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8205-8213},
  doi       = {10.1609/AAAI.V39I8.32885},
  url       = {https://mlanthology.org/aaai/2025/wang2025aaai-mamba/}
}