Oasis: Data Curation and Assessment System for Pretraining of Large Language Models
Abstract
Retrosynthesis, which predicts the reactants of a given target molecule, is an essential task for drug discovery. Retrosynthesis prediction based on molecular graph editing has garnered widespread attention due to excellent interpretability. Existing methods fail to effectively incorporate the chemical knowledge when learning molecular representations. To address this issue, we propose a Knowledge-enhanced Graph Contrastive Learning model (KGCL), which retrieve functional group embeddings from a chemical knowledge graph and integrate them into the atomic embeddings of the product molecule using an attention mechanism. Furthermore, we introduce a graph contrastive learning strategy that generates augmented samples using graph edits to improve the molecular graph encoder. Our proposed method outperforms the strong baseline method Graph2Edits by 1.6% and 3.2% in terms of the top-1 accuracy and top-1 round-trip accuracy on the USPTO-50K dataset, respectively, and also achieves a new state-of-the-art performance among semi-template-based methods on the USPTO-FULL dataset.
Cite
Text
Zhou et al. "Oasis: Data Curation and Assessment System for Pretraining of Large Language Models." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/1048Markdown
[Zhou et al. "Oasis: Data Curation and Assessment System for Pretraining of Large Language Models." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/zhou2024ijcai-oasis/) doi:10.24963/ijcai.2024/1048BibTeX
@inproceedings{zhou2024ijcai-oasis,
title = {{Oasis: Data Curation and Assessment System for Pretraining of Large Language Models}},
author = {Zhou, Tong and Chen, Yubo and Cao, Pengfei and Liu, Kang and Liu, Shengping and Zhao, Jun},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2024},
pages = {8855-8859},
doi = {10.24963/ijcai.2024/1048},
url = {https://mlanthology.org/ijcai/2024/zhou2024ijcai-oasis/}
}