A Scalable LLM Framework for Therapeutic Biomarker Discovery: Grounding Q/A Generation in Knowledge Graphs and Literature

Abstract

Therapeutic biomarkers are crucial in biomedical research and clinical decision-making, yet the field lacks standardized datasets and evaluation methods for complex, context-dependent questions. To address this, we integrate large language models (LLMs) with knowledge graphs (KGs) to filter PubMed abstracts, summarize biomarker contexts, and generate a high-quality synthetic Q/A dataset. Our approach mirrors biomarker scientists' workflows, decomposing question generation into classification, named entity recognition (NER), and summarization. We release a 24k high quality Q/A dataset and show through ablation studies that incorporating NER and summarization improves performance over using abstracts alone. Evaluating multiple LLMs, we find that while models achieve 96\% accuracy on multiple-choice questions, performance drops to 69\% on open-ended Q/A, highlighting the need for synthetic data to address the issue of novel discovery. By addressing a critical resource gap, this work provides a scalable tool for biomarker research and demonstrates AI’s broader potential in scientific discovery.

Cite

Text

Martell et al. "A Scalable LLM Framework for Therapeutic Biomarker Discovery: Grounding Q/A Generation in Knowledge Graphs and Literature." ICLR 2025 Workshops: MLGenX, 2025.

Markdown

[Martell et al. "A Scalable LLM Framework for Therapeutic Biomarker Discovery: Grounding Q/A Generation in Knowledge Graphs and Literature." ICLR 2025 Workshops: MLGenX, 2025.](https://mlanthology.org/iclrw/2025/martell2025iclrw-scalable/)

BibTeX

@inproceedings{martell2025iclrw-scalable,
  title     = {{A Scalable LLM Framework for Therapeutic Biomarker Discovery: Grounding Q/A Generation in Knowledge Graphs and Literature}},
  author    = {Martell, Marc Boubnovski and Märtens, Kaspar and Phillips, Lawrence and Keitley, Daniel and Dermit, Maria and Fauqueur, Julien},
  booktitle = {ICLR 2025 Workshops: MLGenX},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/martell2025iclrw-scalable/}
}