MOCID: Motion Context and Displacement Information Learning for Moving Infrared Small Target Detection
Abstract
In the field of Moving Infrared Small Target Detection (MIRSTD), current methods typically use sequential modeling with two individual modules for spatial and temporal processing. However, such a modeling strategy lacks clear guidance on the motion and displacement difference between moving targets and background noise, thereby limiting the feature discriminability and resulting in error-prone target localization. This paper addresses this issue from clip and frame levels and proposes a novel architecture MOCID for MIRSTD. For clip-level feature fusion, we design a spatio-temporal backbone consisting of several proposed Fourier-inspired Spatio-temporal Attention (FISTA) layers. Each FISTA layer sequentially processes the features from spatial and temporal views to capture clip-level temporal motion context, where Fourier Transformation and Inverse Fourier Transformation are employed for each view. This context is then embedded into dynamic convolutional kernels for subsequent spatial feature extraction, thereby enabling clear motion difference guidance and generating comprehensive features. For frame-level feature fusion, we design a Displacement-aware Mamba Module (DAM) to capture detailed frame-to-frame displacement information. DAM utilizes an innovative Temporal Interpolation and Displacement-aware Scan technique to perform spatio-temporal difference-aware displacement modeling, introducing elaborate temporal indicators into feature extraction. Combining the above improvements, our model captures comprehensive motion and displacement contexts, significantly improving the detection of the small target. Extensive experiments demonstrate that MOCID achieves state-of-the-art detection accuracy on popular IRDST and DAUB datasets. Furthermore, MOCID offers a superior balance between throughput and performance compared to other methods. The code for this work will be made publicly available.
Cite
Text
Zhang et al. "MOCID: Motion Context and Displacement Information Learning for Moving Infrared Small Target Detection." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I10.33087Markdown
[Zhang et al. "MOCID: Motion Context and Displacement Information Learning for Moving Infrared Small Target Detection." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhang2025aaai-mocid/) doi:10.1609/AAAI.V39I10.33087BibTeX
@inproceedings{zhang2025aaai-mocid,
title = {{MOCID: Motion Context and Displacement Information Learning for Moving Infrared Small Target Detection}},
author = {Zhang, Mingjin and Ouyang, Yuanjun and Gao, Fei and Guo, Jie and Zhang, Qiming and Zhang, Jing},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {10022-10030},
doi = {10.1609/AAAI.V39I10.33087},
url = {https://mlanthology.org/aaai/2025/zhang2025aaai-mocid/}
}