Golechha, Satvik

5 publications

NeurIPS 2025 A Is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders David Chanin, James Wilken-Smith, Tomáš Dulka, Hardik Bhatnagar, Satvik Golechha, Joseph Isaac Bloom

NeurIPS 2025 Among Us: A Sandbox for Measuring and Detecting Agentic Deception Satvik Golechha, Adrià Garriga-Alonso

ICMLW 2024 Challenges in Mechanistically Interpreting Model Representations Satvik Golechha, James Dao

ICMLW 2024 Progress Measures for Grokking on Real-World Tasks Satvik Golechha

NeurIPSW 2024 Training Neural Networks for Modularity Aids Interpretability Satvik Golechha, Dylan Cope, Nandi Schoots