Lange, Georg

3 publications

ICLR 2025 Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control Aleksandar Makelov, Georg Lange, Neel Nanda
ICLR 2024 Is This the Subspace You Are Looking for? an Interpretability Illusion for Subspace Activation Patching Aleksandar Makelov, Georg Lange, Atticus Geiger, Neel Nanda
ICLRW 2024 Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control Aleksandar Makelov, Georg Lange, Neel Nanda