Safety of Multimodal Large Language Models on Images and Text

Abstract

Knowledge Graph (KG)-augmented Large Language Models (LLMs) have recently propelled significant advances in complex reasoning tasks, thanks to their broad domain knowledge and contextual awareness. Unfortunately, current methods often assume KGs to be complete, which is impractical given the inherent limitations of KG construction and the potential loss of contextual cues when converting unstructured text into entity-relation triples. In response, this paper proposes the Triple Context Restoration and Query-driven Feedback (TCR-QF) framework, which reconstructs the textual context underlying each triple to mitigate information loss, while dynamically refining the KG structure by iteratively incorporating query-relevant missing knowledge. Experiments on five benchmark question-answering datasets substantiate the effectiveness of TCR-QF in KG and LLM integration, where itachieves a 29.1% improvement in Exact Match and a 15.5% improvement in F1 over its state-of-the-art GraphRAG competitors. The code is publicly available at https://github.com/HFUT-DMiC-Lab/TCR-QF.git.

Cite

Text

Liu et al. "Safety of Multimodal Large Language Models on Images and Text." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/901

Markdown

[Liu et al. "Safety of Multimodal Large Language Models on Images and Text." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/liu2024ijcai-safety/) doi:10.24963/ijcai.2024/901

BibTeX

@inproceedings{liu2024ijcai-safety,
  title     = {{Safety of Multimodal Large Language Models on Images and Text}},
  author    = {Liu, Xin and Zhu, Yichen and Lan, Yunshi and Yang, Chao and Qiao, Yu},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {8151-8159},
  doi       = {10.24963/ijcai.2024/901},
  url       = {https://mlanthology.org/ijcai/2024/liu2024ijcai-safety/}
}