DAMamba: Vision State Space Model with Dynamic Adaptive Scan

Abstract

State space models (SSMs) have recently garnered significant attention in computer vision. However, due to the unique characteristics of image data, adapting SSMs from natural language processing to computer vision has not outperformed the state-of-the-art convolutional neural networks (CNNs) and Vision Transformers (ViTs). Existing vision SSMs primarily leverage manually designed scans to flatten image patches into sequences locally or globally. This approach disrupts the original semantic spatial adjacency of the image and lacks flexibility, making it difficult to capture complex image structures. To address this limitation, we propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions. This enables more flexible modeling capabilities while maintaining linear computational complexity and global modeling capacity. Based on DAS, we further propose the vision backbone DAMamba, which significantly outperforms popular vision Mamba models in vision tasks such as image classification, object detection, instance segmentation, and semantic segmentation. Notably, it surpasses some of the latest state-of-the-art CNNs and ViTs.

Cite

Text

Li et al. "DAMamba: Vision State Space Model with Dynamic Adaptive Scan." Advances in Neural Information Processing Systems, 2025.

Markdown

[Li et al. "DAMamba: Vision State Space Model with Dynamic Adaptive Scan." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/li2025neurips-damamba/)

BibTeX

@inproceedings{li2025neurips-damamba,
  title     = {{DAMamba: Vision State Space Model with Dynamic Adaptive Scan}},
  author    = {Li, Tanzhe and Li, Caoshuo and Lyu, Jiayi and Pei, Hongjuan and Zhang, Baochang and Jin, Taisong and Ji, Rongrong},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/li2025neurips-damamba/}
}