ML Anthology
Authors
Search
About
Varma, Vikrant
3 publications
ICML
2025
MONA: Myopic Optimization with Non-Myopic Approval Can Mitigate Multi-Step Reward Hacking
Sebastian Farquhar
,
Vikrant Varma
,
David Lindner
,
David Elson
,
Caleb Biddulph
,
Ian Goodfellow
,
Rohin Shah
NeurIPS
2024
Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
Lewis Smith
,
Tom Lieberum
,
Vikrant Varma
,
János Kramár
,
Rohin Shah
,
Neel Nanda
ICMLW
2024
Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
Lewis Smith
,
Tom Lieberum
,
Vikrant Varma
,
Janos Kramar
,
Rohin Shah
,
Neel Nanda