SweetBERT: Exploring BERT-Based Models for IUPAC Glycan Nomenclature Modeling
Abstract
Glycans are the most abundant biomolecules on Earth, and participate in key processes in all living organisms. The chemical variability and topological complexity of their natural branched structures has been a challenge in computational glycobiology. As a tool for improving predictive models associated with glycobiology, we propose SweetBERT, a BERT-based language model for encoding glycan sequences which includes explicit information about the branching structure of the sequence. This is achieved by including a pseudo-graph representation in the input embeddings. Performance on downstream tasks by our model underscore promising results of Transformer architectures in addressing the complexities of glycan representation.
Cite
Text
Rubia-Rodríguez et al. "SweetBERT: Exploring BERT-Based Models for IUPAC Glycan Nomenclature Modeling." ICLR 2025 Workshops: GEM, 2025.Markdown
[Rubia-Rodríguez et al. "SweetBERT: Exploring BERT-Based Models for IUPAC Glycan Nomenclature Modeling." ICLR 2025 Workshops: GEM, 2025.](https://mlanthology.org/iclrw/2025/rubiarodriguez2025iclrw-sweetbert/)BibTeX
@inproceedings{rubiarodriguez2025iclrw-sweetbert,
title = {{SweetBERT: Exploring BERT-Based Models for IUPAC Glycan Nomenclature Modeling}},
author = {Rubia-Rodríguez, Irene and Nielsen, Henrik and Gippert, Garry P. and Barrett, Kristian and Henrissat, Bernard and Winther, Ole},
booktitle = {ICLR 2025 Workshops: GEM},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/rubiarodriguez2025iclrw-sweetbert/}
}