GeneGench: Systematic Evaluation of Genomic Foundation Models and Beyond
Abstract
The Genomic Foundation Model (GFM) paradigm is expected to facilitate the extraction of generalizable representations from massive genomic data, thereby enabling their application across a spectrum of downstream applications. Despite advancements, a lack of evaluation framework makes it difficult to ensure equitable assessment due to experimental settings, model intricacy, benchmark datasets, and reproducibility challenges. In the absence of standardization, comparative analyses risk becoming biased and unreliable. To surmount this impasse, we introduce GeneBench, a comprehensive benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GeneBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. Through systematic evaluations of datasets spanning diverse biological domains with a particular emphasis on both short-range and long-range genomic tasks, firstly including the three most important DNA tasks covering Coding Region, Non-Coding Region, Genome Structure, etc. Our results on GenBench have led to an interesting discovery: regardless of the number of parameters, the noticeable variation in preference between attention-based and convolution-based models for short- and long-range tasks could offer valuable insights for the future development of GFM. As a result, we propose a straightforward modified model called Genhybrid, which is an effective and efficient convolution-attention hybrid model suitable for all tasks.
Cite
Text
Liu et al. "GeneGench: Systematic Evaluation of Genomic Foundation Models and Beyond." NeurIPS 2024 Workshops: AIDrugX, 2024.Markdown
[Liu et al. "GeneGench: Systematic Evaluation of Genomic Foundation Models and Beyond." NeurIPS 2024 Workshops: AIDrugX, 2024.](https://mlanthology.org/neuripsw/2024/liu2024neuripsw-genegench/)BibTeX
@inproceedings{liu2024neuripsw-genegench,
title = {{GeneGench: Systematic Evaluation of Genomic Foundation Models and Beyond}},
author = {Liu, Zicheng and Li, Jiahui and Xin, Lei and Li, Siyuan and Yu, Chang and Zang, Zelin and Tan, Cheng and Huang, Yufei and Yajingbai, and Xia, Jun and Li, Stan Z.},
booktitle = {NeurIPS 2024 Workshops: AIDrugX},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/liu2024neuripsw-genegench/}
}