Lange, Georg

3 publications

ICLR 2025 Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control Aleksandar Makelov, Georg Lange, Neel Nanda

ICLR 2024 Is This the Subspace You Are Looking for? an Interpretability Illusion for Subspace Activation Patching Aleksandar Makelov, Georg Lange, Atticus Geiger, Neel Nanda

ICLRW 2024 Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control Aleksandar Makelov, Georg Lange, Neel Nanda