Rajamanoharan, Senthooran

9 publications

ICML 2025 Are Sparse Autoencoders Useful? a Case Study in Sparse Probing Subhash Kantamneni, Joshua Engels, Senthooran Rajamanoharan, Max Tegmark, Neel Nanda

ICLRW 2025 Chain-of-Thought Reasoning in the Wild Is Not Always Faithful Iván Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, Arthur Conmy

NeurIPS 2025 Dense SAE Latents Are Features, Not Bugs Xiaoqing Sun, Alessandro Stolfo, Joshua Engels, Ben Peng Wu, Senthooran Rajamanoharan, Mrinmaya Sachan, Max Tegmark

ICLR 2025 Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Javier Ferrando, Oscar Balcells Obeso, Senthooran Rajamanoharan, Neel Nanda

ICLRW 2025 LLM Neurosurgeon: Targeted Knowledge Removal in LLMs Using Sparse Autoencoders Kunal Patil, Dylan Zhou, Yifan Sun, Karthik Lakshmanan, Senthooran Rajamanoharan, Arthur Conmy

ICLRW 2025 Steering Fine-Tuning Generalization with Targeted Concept Ablation Helena Casademunt, Caden Juang, Samuel Marks, Senthooran Rajamanoharan, Neel Nanda

ICLRW 2025 Steering Fine-Tuning Generalization with Targeted Concept Ablation Helena Casademunt, Caden Juang, Senthooran Rajamanoharan, Neel Nanda

NeurIPS 2024 Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda

ICMLW 2024 Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, Janos Kramar, Rohin Shah, Neel Nanda