GTP-4o: Modality-Prompted Heterogeneous Graph Learning for Omni-Modal Biomedical Representation

Abstract

Recent advances in learning multi-modal representation have witnessed the success in biomedical domains. While established techniques enable handling multi-modal information, the challenges are posed when extended to various clinical modalities and practical modality-missing setting due to the inherent modality gaps. To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph for Omni-modal Learning (GTP-4o), which embeds the numerous disparate clinical modalities into a unified representation, completes the deficient embedding of missing modality and reformulates the cross-modal learning with a graph-based aggregation. Specially, we establish a heterogeneous graph embedding to explicitly capture the diverse semantic properties on both the modality-specific features (nodes) and the cross-modal relations (edges). Then, we design a modality-prompted completion that enables completing the inadequate graph representation of missing modality through a graph prompting mechanism, which generates hallucination graphic topologies to steer the missing embedding towards the intact representation. Through the completed graph, we meticulously develop a knowledge-guided hierarchical cross-modal aggregation consisting of a global meta-path neighbouring to uncover the potential heterogeneous neighbors along the pathways driven by domain knowledge, and a local multi-relation aggregation module for the comprehensive cross-modal interaction across various heterogeneous relations. We assess the efficacy of our methodology on rigorous benchmarking experiments against prior state-of-the-arts. In a nutshell, GTP-4o presents an initial foray into the intriguing realm of embedding, relating and perceiving the heterogeneous patterns from various clinical modalities holistically via a graph theory. Project page: https://gtp-4-o.github.io/.

Cite

Text

Li et al. "GTP-4o: Modality-Prompted Heterogeneous Graph Learning for Omni-Modal Biomedical Representation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73235-5_10

Markdown

[Li et al. "GTP-4o: Modality-Prompted Heterogeneous Graph Learning for Omni-Modal Biomedical Representation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/li2024eccv-gtp4o/) doi:10.1007/978-3-031-73235-5_10

BibTeX

@inproceedings{li2024eccv-gtp4o,
  title     = {{GTP-4o: Modality-Prompted Heterogeneous Graph Learning for Omni-Modal Biomedical Representation}},
  author    = {Li, Chenxin and Liu, Xinyu and Wang, Cheng and Liu, Yifan and Yu, Weihao and Shao, Jing and Yuan, Yixuan},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73235-5_10},
  url       = {https://mlanthology.org/eccv/2024/li2024eccv-gtp4o/}
}