Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Abstract
We present Universal Sparse Autoencoders (USAEs), a framework for uncovering and aligning interpretable concepts spanning multiple pretrained deep neural networks. Unlike existing concept-based interpretability methods, which focus on a single model, USAEs jointly learn a universal concept space that can reconstruct and interpret the internal activations of multiple models at once. Our core insight is to train a single, overcomplete sparse autoencoder (SAE) that ingests activations from any model and decodes them to approximate the activations of any other model under consideration. By optimizing a shared objective, the learned dictionary captures common factors of variation—concepts—across different tasks, architectures, and datasets. We show that USAEs discover semantically coherent and important universal concepts across vision models; ranging from low-level features (e.g., colors and textures) to higher-level structures (e.g., parts and objects). Overall, USAEs provide a powerful new method for interpretable cross-model analysis and offers novel applications—such as coordinated activation maximization—that open avenues for deeper insights in multi-model AI systems.
Cite
Text
Thasarathan et al. "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Thasarathan et al. "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/thasarathan2025icml-universal/)BibTeX
@inproceedings{thasarathan2025icml-universal,
title = {{Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment}},
author = {Thasarathan, Harrish and Forsyth, Julian and Fel, Thomas and Kowal, Matthew and Derpanis, Konstantinos G.},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {59304-59325},
volume = {267},
url = {https://mlanthology.org/icml/2025/thasarathan2025icml-universal/}
}