Multimodal Image Matching Based on Cross-Modality Completion Pre-Training
Abstract
The differences in imaging devices cause multimodal images to have modal differences and geometric distortions, complicating the matching task. Deep learning-based matching methods struggle with multimodal images due to the lack of large annotated multimodal datasets. To address these challenges, we propose XCP-Match based on cross-modality completion pre-training. XCP-Match has two phases. (1) Self-supervised cross-modality completion pre-training based on real multimodal image dataset. We develop a novel pre-training model to learn cross-modal semantic features. The pre-training uses masked image modeling method for cross-modality completion, and introduces an attention-weighted contrastive loss to emphasize matching in overlapping areas. (2) Supervised fine-tuning for multimodal image matching based on the augmented MegaDepth dataset. XCP-Match constructs a complete matching framework to overcome geometric distortions and achieve precise matching. Two-phase training encourages the model to learn deep cross-modal semantic information, improving adaptation to modal differences without needing large annotated datasets. Experiments demonstrate that XCP-Match outperforms existing algorithms on public datasets.
Cite
Text
Yang et al. "Multimodal Image Matching Based on Cross-Modality Completion Pre-Training." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/246Markdown
[Yang et al. "Multimodal Image Matching Based on Cross-Modality Completion Pre-Training." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/yang2025ijcai-multimodal/) doi:10.24963/IJCAI.2025/246BibTeX
@inproceedings{yang2025ijcai-multimodal,
title = {{Multimodal Image Matching Based on Cross-Modality Completion Pre-Training}},
author = {Yang, Meng and Fan, Fan and Huang, Jun and Ma, Yong and Mei, Xiaoguang and Cai, Zhanchuan and Ma, Jiayi},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {2206-2214},
doi = {10.24963/IJCAI.2025/246},
url = {https://mlanthology.org/ijcai/2025/yang2025ijcai-multimodal/}
}