Learning Biologically Relevant Features in a Pathology Foundation Model Using Sparse Autoencoders
Abstract
Pathology plays an important role in disease diagnosis, treatment decision-making and drug development. Previous works on interpretability for machine learning models on pathology images have revolved around methods such as attention value visualization and deriving human-interpretable features from model heatmaps. Mechanistic interpretability in an emerging area of model interpretability that focuses on reverse-engineering neural networks. Sparse Autoencoders (SAEs) have emerged as a promising direction in terms of extracting monosemantic features from model activations. In this work, we train a Sparse Autoencoder on the embeddings of a pathology pretrained foundation model. We discover an interpretable sparse representation of biological concepts within the model embedding space. We perform an investigation into how these representations are associated with quantitative human-interpretable features. Our work paves the way for further exploration around interpretable feature dimensions and their utility for medical and clinical applications.
Cite
Text
Le et al. "Learning Biologically Relevant Features in a Pathology Foundation Model Using Sparse Autoencoders." NeurIPS 2024 Workshops: AIM-FM, 2024.Markdown
[Le et al. "Learning Biologically Relevant Features in a Pathology Foundation Model Using Sparse Autoencoders." NeurIPS 2024 Workshops: AIM-FM, 2024.](https://mlanthology.org/neuripsw/2024/le2024neuripsw-learning/)BibTeX
@inproceedings{le2024neuripsw-learning,
title = {{Learning Biologically Relevant Features in a Pathology Foundation Model Using Sparse Autoencoders}},
author = {Le, Nhat Minh and Patel, Neel and Shen, Ciyue and Martin, Blake and Eng, Alfred and Shah, Chintan and Grullon, Sean and Juyal, Dinkar},
booktitle = {NeurIPS 2024 Workshops: AIM-FM},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/le2024neuripsw-learning/}
}