Murthy, Sonia Krishna

3 publications

ICLR 2026 Cognitive Models Can Reveal Interpretable Value Trade-Offs in Language Models Sonia Krishna Murthy, Rosie Zhao, Jennifer Hu, Sham M. Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman

ICLR 2026 Priors in Time: Missing Inductive Biases for Language Model Interpretability Ekdeep Singh Lubana, Can Rager, Sai Sumedh R. Hindupur, Valérie Costa, Oam Patel, Sonia Krishna Murthy, Thomas Fel, Greta Tuckute, Daniel Wurgaft, Eric Bigelow, Demba E. Ba, Melanie Weber, Aaron Mueller

ICMLW 2023 Comparing the Evaluation and Production of Loophole Behavior in Children and Large Language Models Sonia Krishna Murthy, Sophie Bridgers, Kiera Maria Parece, Elena Glassman, Tomer Ullman