ML Anthology
Authors
Search
About
Rajamanoharan, Senthooran
9 publications
ICML
2025
Are Sparse Autoencoders Useful? a Case Study in Sparse Probing
Subhash Kantamneni
,
Joshua Engels
,
Senthooran Rajamanoharan
,
Max Tegmark
,
Neel Nanda
ICLRW
2025
Chain-of-Thought Reasoning in the Wild Is Not Always Faithful
Iván Arcuschin
,
Jett Janiak
,
Robert Krzyzanowski
,
Senthooran Rajamanoharan
,
Neel Nanda
,
Arthur Conmy
NeurIPS
2025
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun
,
Alessandro Stolfo
,
Joshua Engels
,
Ben Peng Wu
,
Senthooran Rajamanoharan
,
Mrinmaya Sachan
,
Max Tegmark
ICLR
2025
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando
,
Oscar Balcells Obeso
,
Senthooran Rajamanoharan
,
Neel Nanda
ICLRW
2025
LLM Neurosurgeon: Targeted Knowledge Removal in LLMs Using Sparse Autoencoders
Kunal Patil
,
Dylan Zhou
,
Yifan Sun
,
Karthik Lakshmanan
,
Senthooran Rajamanoharan
,
Arthur Conmy
ICLRW
2025
Steering Fine-Tuning Generalization with Targeted Concept Ablation
Helena Casademunt
,
Caden Juang
,
Samuel Marks
,
Senthooran Rajamanoharan
,
Neel Nanda
ICLRW
2025
Steering Fine-Tuning Generalization with Targeted Concept Ablation
Helena Casademunt
,
Caden Juang
,
Senthooran Rajamanoharan
,
Neel Nanda
NeurIPS
2024
Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
Lewis Smith
,
Tom Lieberum
,
Vikrant Varma
,
János Kramár
,
Rohin Shah
,
Neel Nanda
ICMLW
2024
Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
Lewis Smith
,
Tom Lieberum
,
Vikrant Varma
,
Janos Kramar
,
Rohin Shah
,
Neel Nanda