Autoregressive Activity Prediction for Low-Data Drug Discovery

Abstract

Autoregressive modeling is the main learning paradigm behind the currently so successful large language models (LLM). For sequential tasks, such as generating natural language, autoregressive modeling is a natural choice: the sequence is generated by continuously appending the next sequence token. In this work, we investigate whether the autoregressive modeling paradigm could also be successfully used for molecular activity and property prediction models, which are equivalent to LLMs in molecular sciences. To this end, we formulate autoregressive activity prediction modeling (AR-APM), draw relations to transductive and active learning, and assess the predictive quality of AR-APM models in few-shot learning scenarios. Our experiments show that using an existing few-shot learning system without any other changes, except switching to autoregressive mode for inference, improves ∆AUC-PR up to ∼40%. Code is available here: https://github.com/ml-jku/autoregressive_activity_prediction.

Cite

Text

Schimunek et al. "Autoregressive Activity Prediction for Low-Data Drug Discovery." ICLR 2024 Workshops: PML4LRS, 2024.

Markdown

[Schimunek et al. "Autoregressive Activity Prediction for Low-Data Drug Discovery." ICLR 2024 Workshops: PML4LRS, 2024.](https://mlanthology.org/iclrw/2024/schimunek2024iclrw-autoregressive/)

BibTeX

@inproceedings{schimunek2024iclrw-autoregressive,
  title     = {{Autoregressive Activity Prediction for Low-Data Drug Discovery}},
  author    = {Schimunek, Johannes and Friedrich, Lukas and Kuhn, Daniel and Klambauer, Günter},
  booktitle = {ICLR 2024 Workshops: PML4LRS},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/schimunek2024iclrw-autoregressive/}
}