Towards Scientific Discovery with Dictionary Learning: Extracting Biological Concepts from Microscopy Foundation Models

Abstract

Dictionary learning (DL) has emerged as a powerful interpretability tool for large language models. By extracting known concepts (e.g., Golden-Gate Bridge) from human-interpretable data (e.g., text), sparse DL can elucidate a model's inner workings. In this work, we ask if DL can also be used to discover *unknown* concepts from less human-interpretable scientific data (e.g., cell images), ultimately enabling modern approaches to scientific discovery. As a first step, we use DL algorithms to study microscopy foundation models trained on multi-cell image data, where little prior knowledge exists regarding which high-level concepts should arise. We show that sparse dictionaries indeed extract biologically-meaningful concepts such as cell type and genetic perturbation type. We also propose a new DL algorithm, Iterative Codebook Feature Learning~(ICFL) and combine it with a pre-processing step which uses PCA whitening from a control dataset. In our experiments, we demonstrate that both ICFL and PCA improve the selectivity or ``monosemanticity'' of extracted features compared to TopK sparse autoencoders.

Cite

Text

Donhauser et al. "Towards Scientific Discovery with Dictionary Learning: Extracting Biological Concepts from Microscopy Foundation Models." NeurIPS 2024 Workshops: InterpretableAI, 2024.

Markdown

[Donhauser et al. "Towards Scientific Discovery with Dictionary Learning: Extracting Biological Concepts from Microscopy Foundation Models." NeurIPS 2024 Workshops: InterpretableAI, 2024.](https://mlanthology.org/neuripsw/2024/donhauser2024neuripsw-scientific/)

BibTeX

@inproceedings{donhauser2024neuripsw-scientific,
  title     = {{Towards Scientific Discovery with Dictionary Learning: Extracting Biological Concepts from Microscopy Foundation Models}},
  author    = {Donhauser, Konstantin and Moran, Gemma Elyse and Ravuri, Aditya and Kenyon-Dean, Kian and Ulicna, Kristina and Eastwood, Cian and Hartford, Jason},
  booktitle = {NeurIPS 2024 Workshops: InterpretableAI},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/donhauser2024neuripsw-scientific/}
}