Pfohl, Stephen Robert

2 publications

NeurIPSW 2023 Reward Model Underspecification in Language Model Alignment Jacob Eisenstein, Jonathan Berant, Chirag Nagpal, Alekh Agarwal, Ahmad Beirami, Alexander Nicholas D'Amour, Krishnamurthy Dj Dvijotham, Katherine A Heller, Stephen Robert Pfohl, Deepak Ramachandran

NeurIPSW 2023 Understanding Subgroup Performance Differences of Fair Predictors Using Causal Models Stephen Robert Pfohl, Natalie Harris, Chirag Nagpal, David Madras, Vishwali Mhasawade, Olawale Elijah Salaudeen, Katherine A Heller, Sanmi Koyejo, Alexander Nicholas D'Amour