Role of Structural and Conformational Diversity for Machine Learning Potentials

Abstract

In the field of Machine Learning Interatomic Potentials (MLIPs), understanding the intricate relationship between data biases, specifically conformational and structural diversity, and model generalization is critical in improving the quality of Quantum Mechanics (QM) data generation efforts. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal nuanced patterns in generalization metrics. Notably, for optimal structural and conformational generalization we need a careful balance between structural and conformational diversity that existing QM datasets do not meet. Our results also highlight the limitation of the MLIP models at generalizing beyond their training distribution, emphasizing the importance of defining applicability domain during model deployment. These findings provide valuable insights and guidelines for QM data generation efforts.

Cite

Text

Shenoy et al. "Role of Structural and Conformational Diversity for Machine Learning Potentials." NeurIPS 2023 Workshops: AI4Science, 2023.

Markdown

[Shenoy et al. "Role of Structural and Conformational Diversity for Machine Learning Potentials." NeurIPS 2023 Workshops: AI4Science, 2023.](https://mlanthology.org/neuripsw/2023/shenoy2023neuripsw-role-a/)

BibTeX

@inproceedings{shenoy2023neuripsw-role-a,
  title     = {{Role of Structural and Conformational Diversity for Machine Learning Potentials}},
  author    = {Shenoy, Nikhil and Tossou, Prudencio and Noutahi, Emmanuel and Mary, Hadrien and Beaini, Dominique and Ding, Jiarui},
  booktitle = {NeurIPS 2023 Workshops: AI4Science},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/shenoy2023neuripsw-role-a/}
}