MSA-LM: Integrating DNA-Level Inductive Biases into DNA Language Models
Abstract
Recent advances in DNA language modeling have been limited by computational constraints and the ability to capture long-range dependencies within genomic data effectively. While effective, traditional transformer-based models suffer from quadratic complexity and limited context windows, making them unsuitable for large-scale DNA modeling. In contrast, subquadratic models, while efficient, often lack bidirectionality and struggle with training scalability. We introduce MSA-LM, an inductive-bias-aware subquadratic DNA Multiple Sequence Alignment (MSA) model that addresses these limitations. MSA-LM integrates a bidirectional Mamba model for sequence mixing, providing transformer-like expressibility without the associated quadratic complexity. By utilizing a sparse attention mechanism, MSA-LM selectively processes the main DNA sequence while incorporating evolutionary information from MSA data, significantly reducing computational overhead. Our results demonstrate that MSA-LM achieves state-of-the-art performance on long-context variant effect prediction tasks and Genomic Benchmarks, particularly excelling in regulatory sequence analysis. The proposed model not only surpasses existing transformer-based and subquadratic approaches in efficiency but also maintains high accuracy across diverse genomic tasks, marking a significant improvement in DNA language modeling capabilities.
Cite
Text
Thoutam. "MSA-LM: Integrating DNA-Level Inductive Biases into DNA Language Models." NeurIPS 2024 Workshops: AIM-FM, 2024.Markdown
[Thoutam. "MSA-LM: Integrating DNA-Level Inductive Biases into DNA Language Models." NeurIPS 2024 Workshops: AIM-FM, 2024.](https://mlanthology.org/neuripsw/2024/thoutam2024neuripsw-msalm/)BibTeX
@inproceedings{thoutam2024neuripsw-msalm,
title = {{MSA-LM: Integrating DNA-Level Inductive Biases into DNA Language Models}},
author = {Thoutam, Vishrut},
booktitle = {NeurIPS 2024 Workshops: AIM-FM},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/thoutam2024neuripsw-msalm/}
}