Golechha, Satvik

5 publications

NeurIPS 2025 A Is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders David Chanin, James Wilken-Smith, Tomáš Dulka, Hardik Bhatnagar, Satvik Golechha, Joseph Isaac Bloom
NeurIPS 2025 Among Us: A Sandbox for Measuring and Detecting Agentic Deception Satvik Golechha, Adrià Garriga-Alonso
ICMLW 2024 Challenges in Mechanistically Interpreting Model Representations Satvik Golechha, James Dao
ICMLW 2024 Progress Measures for Grokking on Real-World Tasks Satvik Golechha
NeurIPSW 2024 Training Neural Networks for Modularity Aids Interpretability Satvik Golechha, Dylan Cope, Nandi Schoots