Mamba YOLO: A Simple Baseline for Object Detection with State Space Model
Abstract
Driven by the rapid development of deep learning technology, the YOLO series has set a new benchmark for real-time object detectors. Additionally, transformer-based structures have emerged as the most powerful solution in the field, greatly extending the model's receptive field and achieving significant performance improvements. However, this improvement comes at a cost, as the quadratic complexity of the self-attentive mechanism increases the computational burden of the model. To address this problem, we introduce a simple yet effective baseline approach called Mamba YOLO. Our contributions are as follows: 1) We propose that the ODMamba backbone introduce a State Space Model (SSM) with linear complexity to address the quadratic complexity of self-attention. Unlike the other Transformer-base and SSM-base method, ODMamba is simple to train without pretraining. 2) For real-time requirement, we designed the macro structure of ODMamba, determined the optimal stage ratio and scaling size. 3) We design the RG Block that employs a multi-branch structure to model the channel dimensions, which addresses the possible limitations of SSM in sequence modeling, such as insufficient receptive fields and weak image localization. This design captures localized image dependencies more accurately and significantly. Extensive experiments on the publicly available COCO benchmark dataset show that Mamba YOLO achieves state-of-the-art performance compared to previous methods. Specifically, a tiny version of Mamba YOLO achieves a 7.5% improvement in mAP on a single 4090 GPU with an inference time of 1.5 ms.
Cite
Text
Wang et al. "Mamba YOLO: A Simple Baseline for Object Detection with State Space Model." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I8.32885Markdown
[Wang et al. "Mamba YOLO: A Simple Baseline for Object Detection with State Space Model." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wang2025aaai-mamba/) doi:10.1609/AAAI.V39I8.32885BibTeX
@inproceedings{wang2025aaai-mamba,
title = {{Mamba YOLO: A Simple Baseline for Object Detection with State Space Model}},
author = {Wang, Zeyu and Li, Chen and Xu, Huiying and Zhu, Xinzhong and Li, Hongbo},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {8205-8213},
doi = {10.1609/AAAI.V39I8.32885},
url = {https://mlanthology.org/aaai/2025/wang2025aaai-mamba/}
}