ML Anthology
Authors
Search
About
Golechha, Satvik
5 publications
NeurIPS
2025
A Is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
David Chanin
,
James Wilken-Smith
,
Tomáš Dulka
,
Hardik Bhatnagar
,
Satvik Golechha
,
Joseph Isaac Bloom
NeurIPS
2025
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha
,
Adrià Garriga-Alonso
ICMLW
2024
Challenges in Mechanistically Interpreting Model Representations
Satvik Golechha
,
James Dao
ICMLW
2024
Progress Measures for Grokking on Real-World Tasks
Satvik Golechha
NeurIPSW
2024
Training Neural Networks for Modularity Aids Interpretability
Satvik Golechha
,
Dylan Cope
,
Nandi Schoots