BoChemian: Large Language Model Embeddings for Bayesian Optimization of Chemical Reactions

Abstract

This paper explores the integration of Large Language Models (LLM) embeddings with Bayesian Optimization (BO) in the domain of chemical reaction optimization with the showcase study on Buchwald-Hartwig reactions. By leveraging LLMs, we can transform textual chemical procedures into an informative feature space suitable for Bayesian optimization. Our findings show that even out-of-the-box open-source LLMs can map chemical reactions for optimization tasks, highlighting their latent specialized knowledge. The results motivate the consideration of further model specialization through adaptive fine-tuning within the bo framework for on-the-fly optimization. This work serves as a foundational step toward a unified computational framework that synergizes textual chemical descriptions with machine-driven optimization, aiming for more efficient and accessible chemical research. The code is available at: https://github.com/schwallergroup/bochemian.

Cite

Text

Ranković and Schwaller. "BoChemian: Large Language Model Embeddings for Bayesian Optimization of Chemical Reactions." NeurIPS 2023 Workshops: ReALML, 2023.

Markdown

[Ranković and Schwaller. "BoChemian: Large Language Model Embeddings for Bayesian Optimization of Chemical Reactions." NeurIPS 2023 Workshops: ReALML, 2023.](https://mlanthology.org/neuripsw/2023/rankovic2023neuripsw-bochemian/)

BibTeX

@inproceedings{rankovic2023neuripsw-bochemian,
  title     = {{BoChemian: Large Language Model Embeddings for Bayesian Optimization of Chemical Reactions}},
  author    = {Ranković, Bojana and Schwaller, Philippe},
  booktitle = {NeurIPS 2023 Workshops: ReALML},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/rankovic2023neuripsw-bochemian/}
}