A Large Encoder-Decoder Polymer-Based Foundation Model

Abstract

Representation systems for polymers are a constant issue in deep-learning models for polymer property prediction, necessitating a balance between structural accuracy with interoperability to achieve utility in property prediction tasks. To facilitate this, we introduce a serialized polymer graph (SPG) notation and SPG-TED289M, a SPG-based foundation model for polymers, which has been pre-trained on a carefully curated dataset of 1 million SPG samples. To better handle the unique characteristics of SPG, we extended the tokenization process, resulting in a vocabulary of 2,407 distinct tokens. We evaluated the SPG-TED289M model's performance across a range of tasks including copolymer phase behavior, polymer membrane properties, multi-task learning, refractive index prediction, ionic conductivity, gas permeability, and glass transition temperature. The model demonstrated state-of-the-art performance in most of these areas, achieving results on par with specialized models designed for specific tasks. This indicates that SPG-TED289M, with minimal fine-tuning, can adapt effectively to complex polymer-related tasks, showcasing its robustness and versatility as a foundation model. The SPG-TED289M model provides significant flexibility and scalability, making it a valuable tool for various applications in polymer science.

Cite

Text

Soares et al. "A Large Encoder-Decoder Polymer-Based Foundation Model." NeurIPS 2024 Workshops: AI4Mat, 2024.

Markdown

[Soares et al. "A Large Encoder-Decoder Polymer-Based Foundation Model." NeurIPS 2024 Workshops: AI4Mat, 2024.](https://mlanthology.org/neuripsw/2024/soares2024neuripsw-large/)

BibTeX

@inproceedings{soares2024neuripsw-large,
  title     = {{A Large Encoder-Decoder Polymer-Based Foundation Model}},
  author    = {Soares, Eduardo and Park, Nathaniel and Brazil, Emilio Vital and Shirasuna, Victor Yukio},
  booktitle = {NeurIPS 2024 Workshops: AI4Mat},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/soares2024neuripsw-large/}
}