Gene-Centric Evaluation of Causal Variant Prediction for DNA Models

Abstract

DNA models hold significant potential for linking genetic variation to transcriptional regulation, which is crucial for understanding disease mechanisms at the genetic and molecular level and developing targeted therapies. Supervised approaches, such as Enformer and Basenji, have shown promising results in predicting causal variants. Recently, self-supervised models like Nucleotide Transformer and HyenaDNA have made remarkable advancements, with variant-centric benchmarks suggesting competitive performance on the variant effect prediction task. In this study, we propose to evaluate models also on gene-centric benchmarks, which often are of higher relevance to the genetics community for mapping causal variants to affected genes.

Cite

Text

Kapourani et al. "Gene-Centric Evaluation of Causal Variant Prediction for DNA Models." ICML 2024 Workshops: ML4LMS, 2024.

Markdown

[Kapourani et al. "Gene-Centric Evaluation of Causal Variant Prediction for DNA Models." ICML 2024 Workshops: ML4LMS, 2024.](https://mlanthology.org/icmlw/2024/kapourani2024icmlw-genecentric/)

BibTeX

@inproceedings{kapourani2024icmlw-genecentric,
  title     = {{Gene-Centric Evaluation of Causal Variant Prediction for DNA Models}},
  author    = {Kapourani, Chantriolnt-Andreas and Del Vecchio, Alice and Dobrowolska, Agnieszka and Anighoro, Andrew and Hessel, Edith M. and Edwards, Lindsay and Regep, Cristian},
  booktitle = {ICML 2024 Workshops: ML4LMS},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/kapourani2024icmlw-genecentric/}
}