A Topologically Guided Machine Learning Framework for Enhanced Fine-Mapping in Whole-Genome Bacterial Studies
Abstract
This paper proposes a feature selection framework for machine learning–based bacterial genome-wide association studies aimed at uncovering resistance-causing traits. Using a well-characterized Staphylococcus aureus pangenome as a ground truth for causal‐variant labels, we demonstrate improved control for population structure and enhanced interpretability through the explicit incorporation of genomic context derived from graph-structured data, based on the compacted de Bruijn graph for an assembled pangenome. Our framework successfully uncovers resistance-causing traits for 9 of 14 antibiotics using a significantly reduced feature set, while preserving genomic marker identifiability via unique mappings between the encoded feature space and sequential representations that tag specific genomic loci.
Cite
Text
James et al. "A Topologically Guided Machine Learning Framework for Enhanced Fine-Mapping in Whole-Genome Bacterial Studies." ICLR 2025 Workshops: MLGenX, 2025.Markdown
[James et al. "A Topologically Guided Machine Learning Framework for Enhanced Fine-Mapping in Whole-Genome Bacterial Studies." ICLR 2025 Workshops: MLGenX, 2025.](https://mlanthology.org/iclrw/2025/james2025iclrw-topologically/)BibTeX
@inproceedings{james2025iclrw-topologically,
title = {{A Topologically Guided Machine Learning Framework for Enhanced Fine-Mapping in Whole-Genome Bacterial Studies}},
author = {James, Tamsin Emily and Tino, Peter and Wheeler, Nicole E},
booktitle = {ICLR 2025 Workshops: MLGenX},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/james2025iclrw-topologically/}
}