Varma, Vikrant

3 publications

ICML 2025 MONA: Myopic Optimization with Non-Myopic Approval Can Mitigate Multi-Step Reward Hacking Sebastian Farquhar, Vikrant Varma, David Lindner, David Elson, Caleb Biddulph, Ian Goodfellow, Rohin Shah
NeurIPS 2024 Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda
ICMLW 2024 Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, Janos Kramar, Rohin Shah, Neel Nanda