From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery

Abstract

Molecule discovery serves as a cornerstone in numerous scientific domains, fueling the development of new materials and innovative drug designs. Recent developments of in-silico molecule discovery have highlighted the promising results of cross-modal techniques, which bridge molecular structures with their descriptive annotations. However, these cross-modal methods frequently encounter the issue of data scarcity, hampering their performance and application. In this paper, we address the low-resource challenge by utilizing artificially-real data generated by Large Language Models (LLMs). We first introduce a retrieval-based prompting strategy to construct high-quality pseudo data, then explore the optimal method to effectively leverage this pseudo data. Experiments show that using pseudo data for domain adaptation outperforms all existing methods, while also requiring a smaller model scale, reduced data size and lower training cost, highlighting its efficiency. Furthermore, our method shows a sustained improvement as the volume of pseudo data increases, revealing the great potential of pseudo data in advancing low-resource cross-modal molecule discovery.

Cite

Text

Chen et al. "From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I20.30198

Markdown

[Chen et al. "From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/chen2024aaai-artificially/) doi:10.1609/AAAI.V38I20.30198

BibTeX

@inproceedings{chen2024aaai-artificially,
  title     = {{From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery}},
  author    = {Chen, Yuhan and Xi, Nuwa and Du, Yanrui and Wang, Haochun and Chen, Jianyu and Zhao, Sendong and Qin, Bing},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {21958-21966},
  doi       = {10.1609/AAAI.V38I20.30198},
  url       = {https://mlanthology.org/aaai/2024/chen2024aaai-artificially/}
}