KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge
Abstract
The molecular large language models have garnered widespread attention due to their promising potential on molecular applications. However, current molecular large language models face significant limitations in understanding molecules due to inadequate textual descriptions and suboptimal molecular representation strategies during pretraining. To address these challenges, we introduce KnowMol-100K, a large-scale dataset with 100K fine-grained molecular annotations across multiple levels, bridging the gap between molecules and textual descriptions. Additionally, we propose chemically-informative molecular representation, effectively addressing limitations in existing molecular representation strategies. Building upon these innovations, we develop KnowMol, a state-of-the-art multi-modal molecular large language model. Extensive experiments demonstrate that KnowMol achieves superior performance across molecular understanding and generation tasks.
Cite
Text
Yang et al. "KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge." Advances in Neural Information Processing Systems, 2025.Markdown
[Yang et al. "KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/yang2025neurips-knowmol/)BibTeX
@inproceedings{yang2025neurips-knowmol,
title = {{KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge}},
author = {Yang, Zaifei and Chang, Hong and Hou, RuiBing and Shan, Shiguang and Chen, Xilin},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/yang2025neurips-knowmol/}
}