Laidlaw, Cassidy

21 publications

ICML 2025 AssistanceZero: Scalably Solving Assistance Games Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan
ICLR 2025 Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking Cassidy Laidlaw, Shivam Singhal, Anca Dragan
ICLR 2025 Iterative Label Refinement Matters More than Preference Optimization Under Weak Supervision Yaowen Ye, Cassidy Laidlaw, Jacob Steinhardt
ICLRW 2025 Scalably Solving Assistance Games Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan
ICMLW 2024 AssistanceZero: Scalably Solving Assistance Games Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan
ICLR 2024 Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell
ICMLW 2024 Scalable Oversight by Accounting for Unreliable Feedback Shivam Singhal, Cassidy Laidlaw, Anca Dragan
ICMLW 2024 Scalably Solving Assistance Games Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan
ICLR 2024 The Effective Horizon Explains Deep RL Performance in Stochastic Environments Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan
NeurIPSW 2023 A Theoretical Explanation of Deep RL Performance in Stochastic Environments Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan
NeurIPSW 2023 A Theoretical Explanation of Deep RL Performance in Stochastic Environments Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan
NeurIPS 2023 Bridging RL Theory and Practice with the Effective Horizon Cassidy Laidlaw, Stuart J Russell, Anca Dragan
ICMLW 2023 Bridging RL Theory and Practice with the Effective Horizon Cassidy Laidlaw, Stuart Russell, Anca Dragan
ICMLW 2023 Preventing Reward Hacking with Occupancy Measure Regularization Cassidy Laidlaw, Shivam Singhal, Anca Dragan
ICMLW 2023 Preventing Reward Hacking with Occupancy Measure Regularization Cassidy Laidlaw, Shivam Singhal, Anca Dragan
NeurIPSW 2023 Understanding Hidden Context in Preference Learning: Consequences for RLHF Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell
NeurIPSW 2023 Understanding Hidden Context in Preference Learning: Consequences for RLHF Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell
ICLR 2022 The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models Cassidy Laidlaw, Anca Dragan
ICLR 2021 Perceptual Adversarial Robustness: Defense Against Unseen Threat Models Cassidy Laidlaw, Sahil Singla, Soheil Feizi
NeurIPS 2021 Uncertain Decisions Facilitate Better Preference Learning Cassidy Laidlaw, Stuart J. Russell
NeurIPS 2019 Functional Adversarial Attacks Cassidy Laidlaw, Soheil Feizi