ML Anthology
Authors
Search
About
Makelov, Aleksandar
6 publications
ICLR
2025
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov
,
Georg Lange
,
Neel Nanda
ICLR
2024
Is This the Subspace You Are Looking for? an Interpretability Illusion for Subspace Activation Patching
Aleksandar Makelov
,
Georg Lange
,
Atticus Geiger
,
Neel Nanda
ICMLW
2024
Sparse Autoencoders Match Supervised Features for Model Steering on the IOI Task
Aleksandar Makelov
ICLRW
2024
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov
,
Georg Lange
,
Neel Nanda
ICML
2023
Rethinking Backdoor Attacks
Alaa Khaddaj
,
Guillaume Leclerc
,
Aleksandar Makelov
,
Kristian Georgiev
,
Hadi Salman
,
Andrew Ilyas
,
Aleksander Madry
ICLR
2018
Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry
,
Aleksandar Makelov
,
Ludwig Schmidt
,
Dimitris Tsipras
,
Adrian Vladu