Language Models in Molecular Discovery

Abstract

The success of language models, especially transformers in natural language processing, has trickled into scientific domains, giving rise to the concept of "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle, as evidenced by promising recent findings in early-stage drug discovery. In this perspective, we review the role of language models in molecular discovery, underlining their strengths and examining their weaknesses in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets to lower the entry barrier to the field of scientific language modeling. Furthermore, as a solution to some of the weaknesses we identify, we outline a vision for future molecular design that integrates a chat-bot interface with available computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.

Cite

Text

Janakarajan et al. "Language Models in Molecular Discovery." NeurIPS 2023 Workshops: AI4Science, 2023.

Markdown

[Janakarajan et al. "Language Models in Molecular Discovery." NeurIPS 2023 Workshops: AI4Science, 2023.](https://mlanthology.org/neuripsw/2023/janakarajan2023neuripsw-language/)

BibTeX

@inproceedings{janakarajan2023neuripsw-language,
  title     = {{Language Models in Molecular Discovery}},
  author    = {Janakarajan, Nikita and Erdmann, Tim and Swaminathan, Sarathkrishna and Laino, Teodoro and Born, Jannis},
  booktitle = {NeurIPS 2023 Workshops: AI4Science},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/janakarajan2023neuripsw-language/}
}