SaGol: Using MiniGPT-4 to Generate Alt Text for Improving Image Accessibility

Abstract

The remarkable success of Large Language Models (LLMs) across diverse tasks has driven the research community to extend their capabilities to molecular applications. However, most molecular LLMs employ adapter-based architectures that fail to equally integrate molecule and text modalities and lack explicit supervision signals for the molecular modality. To address these issues, we introduce UniMoT, a Unified Molecule-Text LLM adopting a tokenizer-based architecture that expands the vocabulary of LLMs with molecule tokens. Specifically, we introduce a Vector Quantization-driven tokenizer that incorporates a Q-Former to bridge the modality gap between molecule and text. This tokenizer transforms molecular structures into sequences of tokens exhibiting causal dependency, thereby encapsulating both high-level molecular features and textual information. Equipped with this tokenizer, UniMoT unifies molecule and text modalities under a shared token representation and an autoregressive training paradigm. This enables the model to process molecular structures as a distinct linguistic system and generate them in textual form. Through a four-stage training scheme, UniMoT functions as a multi-modal generalist capable of performing both molecule-to-text and text-to-molecule tasks. Extensive experiments demonstrate that UniMoT achieves state-of-the-art performance across a wide range of molecule comprehension and generation tasks.

Cite

Text

Moon et al. "SaGol: Using MiniGPT-4 to Generate Alt Text for Improving Image Accessibility." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/1023

Markdown

[Moon et al. "SaGol: Using MiniGPT-4 to Generate Alt Text for Improving Image Accessibility." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/moon2024ijcai-sagol/) doi:10.24963/ijcai.2024/1023

BibTeX

@inproceedings{moon2024ijcai-sagol,
  title     = {{SaGol: Using MiniGPT-4 to Generate Alt Text for Improving Image Accessibility}},
  author    = {Moon, Yunseo and Lee, Hyunmin and Oh, Seung Young and Jung, Hyunggu},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {8745-8748},
  doi       = {10.24963/ijcai.2024/1023},
  url       = {https://mlanthology.org/ijcai/2024/moon2024ijcai-sagol/}
}