Ai Agent for Data-Driven Hypothesis Exploration in Single-Cell Transcriptomics

Abstract

Large Language Models (LLMs) have the ability to utlilize expert knowledge and simulate human thinking, which potentially makes them instrumental for a variety of scientific tasks. However, since scientific data is heterogeneous, often presented in the form of unordered tables, bridging the gap between unstructured non-textual data and the language processing capabilities of LLMs remains an open challenge. Agentic AI offers a promising approach by enabling LLMs to interactively query datasets for relevant information. Here, we explore the application of this agentic paradigm to single-cell transcriptomic analysis, with a specific focus on cell type annotation. Our results show that when LLMs are equipped with data-querying capabilities, their performance in annotating cell types improves significantly compared to single-shot prompting. Furthermore, we provide a proof of concept illustration of how our method can be used to integrate diverse single-cell datasets (e.g., cell census), ensuring consistent annotation across multiple sources, facilitating meta-analysis across big sample cohorts.

Cite

Text

Bakulin et al. "Ai Agent for Data-Driven Hypothesis Exploration in Single-Cell Transcriptomics." ICLR 2025 Workshops: MLGenX, 2025.

Markdown

[Bakulin et al. "Ai Agent for Data-Driven Hypothesis Exploration in Single-Cell Transcriptomics." ICLR 2025 Workshops: MLGenX, 2025.](https://mlanthology.org/iclrw/2025/bakulin2025iclrw-ai/)

BibTeX

@inproceedings{bakulin2025iclrw-ai,
  title     = {{Ai Agent for Data-Driven Hypothesis Exploration in Single-Cell Transcriptomics}},
  author    = {Bakulin, Artemy and Boyeau, Pierre and Yosef, Nir},
  booktitle = {ICLR 2025 Workshops: MLGenX},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/bakulin2025iclrw-ai/}
}