Fusion from a Distributional Perspective: A Unified Symbiotic Diffusion Framework for Any Multisource Remote Sensing Data Classification

Abstract

Visual grounding (VG) refers to detecting the specific objects in images based on linguistic expressions, and it has profound significance in the advanced interpretation of natural images. In remote sensing image interpretation, visual grounding is limited by characteristics such as the complex scenes and diverse object sizes. To solve this problem, we propose a novel remote sensing visual grounding (RSVG) framework, named language-guided hybrid representation learning Transformer (LGFormer). Specifically, we designed a multimodal dual-encoder Transformer structure called the adaptive multimodal feature fusion module. This structure innovatively integrates text and visual features as hybrid queries, enabling early-stage decoding queries to perceive the target position accurately. Then, the different modal information from the dual encoders is aggregated by hybrid queries to obtain the final object embedding for coordinate regression. Besides, a multi-scale cross-modal feature enhancement module (MSCM) is designed to enhance the self-representation of the extracted text and visual features and align them semantically. As for the hybrid queries, we use linguistic guidance to select visual features as the visual part and sentence-level features as the textual part. Finally, the LGFormer model we designed achieved the best results compared to existing models on the DIOR-RSVG and OPT-RSVG datasets.

Cite

Text

Yang et al. "Fusion from a Distributional Perspective: A Unified Symbiotic Diffusion Framework for Any Multisource Remote Sensing Data Classification." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/174

Markdown

[Yang et al. "Fusion from a Distributional Perspective: A Unified Symbiotic Diffusion Framework for Any Multisource Remote Sensing Data Classification." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/yang2024ijcai-fusion/) doi:10.24963/ijcai.2024/174

BibTeX

@inproceedings{yang2024ijcai-fusion,
  title     = {{Fusion from a Distributional Perspective: A Unified Symbiotic Diffusion Framework for Any Multisource Remote Sensing Data Classification}},
  author    = {Yang, Teng and Xiao, Song and Dong, Wenqian and Qu, Jiahui and Yang, Yueguang},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {1570-1578},
  doi       = {10.24963/ijcai.2024/174},
  url       = {https://mlanthology.org/ijcai/2024/yang2024ijcai-fusion/}
}