Siththaranjan, Anand

8 publications

ICML 2024 AI Alignment with Changing and Influenceable Reward Functions Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan
ICLRW 2024 AI Alignment with Changing and Influenceable Reward Functions Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan
ICMLW 2024 AI Alignment with Changing and Influenceable Reward Functions Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan
ICMLW 2024 AI Alignment with Changing and Influenceable Reward Functions Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan
ICLR 2024 Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell
TMLR 2023 Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomek Korbak, David Lindner, Pedro Freire, Tony Tong Wang, Samuel Marks, Charbel-Raphael Segerie, Micah Carroll, Andi Peng, Phillip J.K. Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell
NeurIPSW 2023 Understanding Hidden Context in Preference Learning: Consequences for RLHF Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell
NeurIPSW 2023 Understanding Hidden Context in Preference Learning: Consequences for RLHF Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell