Grounding QA Generation in Knowledge Graphs and Literature: A Scalable LLM Framework for Scientific Discovery
Abstract
Therapeutic biomarkers are crucial in biomedical research and clinical decision-making, yet the field lacks standardized datasets and evaluation methods for complex, context-dependent questions. To address this, we integrate large language models (LLMs) with knowledge graphs (KGs) to filter PubMed abstracts, summarize biomarker contexts, and generate a high-quality synthetic Q/A dataset. Our approach mirrors biomarker scientists' workflows, decomposing question generation into classification, named entity recognition (NER), and summarization. We release a 24k high quality Q/A dataset and show through ablation studies that incorporating NER and summarization improves performance over using abstracts alone. Evaluating multiple LLMs, we find that while models achieve 96\% accuracy on multiple-choice questions, performance drops to 69\% on open-ended Q/A, highlighting the need for synthetic data to address the issue of novel discovery. By addressing a critical resource gap, this work provides a scalable tool for biomarker research and demonstrates AI’s broader potential in scientific discovery.
Cite
Text
Martell et al. "Grounding QA Generation in Knowledge Graphs and Literature: A Scalable LLM Framework for Scientific Discovery." ICLR 2025 Workshops: SynthData, 2025.Markdown
[Martell et al. "Grounding QA Generation in Knowledge Graphs and Literature: A Scalable LLM Framework for Scientific Discovery." ICLR 2025 Workshops: SynthData, 2025.](https://mlanthology.org/iclrw/2025/martell2025iclrw-grounding/)BibTeX
@inproceedings{martell2025iclrw-grounding,
title = {{Grounding QA Generation in Knowledge Graphs and Literature: A Scalable LLM Framework for Scientific Discovery}},
author = {Martell, Marc Boubnovski and Märtens, Kaspar and Phillips, Lawrence and Keitley, Daniel and Dermit, Maria and Fauqueur, Julien},
booktitle = {ICLR 2025 Workshops: SynthData},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/martell2025iclrw-grounding/}
}