Data-Efficient Molecular Generation with Hierarchical Textual Inversion

Abstract

Developing an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical Textual Inversion for Molecular Generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution. We propose to use multi-level embeddings to reflect such hierarchical features based on the adoption of the recent textual inversion technique in the visual domain, which achieves data-efficient image generation. Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution. We then generate molecules based on the interpolation of the multi-level token embeddings. Extensive experiments demonstrate the superiority of HI-Mol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50x less training data. We also show the effectiveness of molecules generated by HI-Mol in low-shot molecular property prediction. Code is available at https://github.com/Seojin-Kim/HI-Mol.

Cite

Text

Kim et al. "Data-Efficient Molecular Generation with Hierarchical Textual Inversion." International Conference on Machine Learning, 2024.

Markdown

[Kim et al. "Data-Efficient Molecular Generation with Hierarchical Textual Inversion." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/kim2024icml-dataefficient/)

BibTeX

@inproceedings{kim2024icml-dataefficient,
  title     = {{Data-Efficient Molecular Generation with Hierarchical Textual Inversion}},
  author    = {Kim, Seojin and Nam, Jaehyun and Yu, Sihyun and Shin, Younghoon and Shin, Jinwoo},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {24392-24414},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/kim2024icml-dataefficient/}
}