Fore-Mamba3D: Mamba-Based Foreground-Enhanced Encoding for 3D Object Detection

Abstract

Linear modeling methods like Mamba have been merged as the effective backbone for the 3D object detection task. However, previous Mamba-based methods utilize the bidirectional encoding for the whole non-empty voxel sequence, which contains abundant useless background information in the scenes. Though directly encoding foreground voxels appears to be a plausible solution, it tends to degrade detection performance. We attribute this to the response attenuation and restricted context representation in the linear modeling for fore-only sequences. To address this problem, we propose a novel backbone, termed Fore-Mamba3D, to focus on the foreground enhancement by modifying Mamba-based encoder. The foreground voxels are first sampled according to the predicted scores. Considering the response attenuation existing in the interaction of foreground voxels across different instances, we design a regional-to-global slide window (RGSW) to propagate the information from regional split to the entire sequence. Furthermore, a semantic-assisted and state spatial fusion module (SASFMamba) is proposed to enrich contextual representation by enhancing semantic and geometric awareness within the Mamba model. Our method emphasizes foreground-only encoding and alleviates the distance-based and causal dependencies in the linear autoregression model. The superior performance across various benchmarks demonstrates the effectiveness of Fore-Mamba3D in the 3D object detection task.

Cite

Text

Ning et al. "Fore-Mamba3D: Mamba-Based Foreground-Enhanced Encoding for 3D Object Detection." International Conference on Learning Representations, 2026.

Markdown

[Ning et al. "Fore-Mamba3D: Mamba-Based Foreground-Enhanced Encoding for 3D Object Detection." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/ning2026iclr-foremamba3d/)

BibTeX

@inproceedings{ning2026iclr-foremamba3d,
  title     = {{Fore-Mamba3D: Mamba-Based Foreground-Enhanced Encoding for 3D Object Detection}},
  author    = {Ning, Zhiwei and Gao, Xuanang and Cao, Jiaxi and Yang, Runze and Xu, Huiying and Zhu, Xinzhong and Yang, Jie and Liu, Wei},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/ning2026iclr-foremamba3d/}
}