Interpretable Prediction of DNA Replication Origins in S. Cerevisiae Using Attention-Based Motif Discovery
Abstract
In a living cell, DNA replication begins at multiple genomic sites called replication origins. Identifying these origins and their underlying base sequence composition is crucial for understanding replication process. Existing machine learning methods for origin prediction often require labor-intensive feature engineering or lack interpretability. Here, we employ DNABERT to predict yeast replication origins and uncover sequence motifs by combining attention maps with MEME, a classical bioinformatics tool. Our approach eliminates manual feature extraction and identifies biologically relevant motifs across datasets of varying complexity. This work advances interpretable machine learning in genomics, offering a potentially generalizable framework for origin prediction and motif discovery.
Cite
Text
Piroozeh et al. "Interpretable Prediction of DNA Replication Origins in S. Cerevisiae Using Attention-Based Motif Discovery." ICLR 2025 Workshops: MLGenX, 2025.Markdown
[Piroozeh et al. "Interpretable Prediction of DNA Replication Origins in S. Cerevisiae Using Attention-Based Motif Discovery." ICLR 2025 Workshops: MLGenX, 2025.](https://mlanthology.org/iclrw/2025/piroozeh2025iclrw-interpretable/)BibTeX
@inproceedings{piroozeh2025iclrw-interpretable,
title = {{Interpretable Prediction of DNA Replication Origins in S. Cerevisiae Using Attention-Based Motif Discovery}},
author = {Piroozeh, Zohreh and Akerman, Ildem and Kesselheim, Stefan and Kalinina, Olga and Bazarova, Alina},
booktitle = {ICLR 2025 Workshops: MLGenX},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/piroozeh2025iclrw-interpretable/}
}