Metcalf, Katherine

7 publications

ICML 2025 Aligning LLMs by Predicting Preferences from User Writing Samples Stéphane Aroca-Ouellette, Natalie Mackraz, Barry-John Theobald, Katherine Metcalf

ICML 2025 Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs Yinong Oliver Wang, Nivedha Sivakumar, Falaah Arif Khan, Katherine Metcalf, Adam Golinski, Natalie Mackraz, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff

AAAI 2024 Can You Rely on Synthetic Labellers in Preference-Based Reinforcement Learning? It's Complicated Katherine Metcalf, Miguel Sarabia, Masha Fedzechkina, Barry-John Theobald

ICLR 2024 Hindsight PRIORs for Reward Learning from Human Preferences Mudit Verma, Katherine Metcalf

ICML 2024 Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models Xavier Suau, Pieter Delobelle, Katherine Metcalf, Armand Joulin, Nicholas Apostoloff, Luca Zappella, Pau Rodriguez

CoRL 2023 Sample-Efficient Preference-Based Reinforcement Learning with Dynamics Aware Rewards Katherine Metcalf, Miguel Sarabia, Natalie Mackraz, Barry-John Theobald

IJCAI 2019 Unsupervised Hierarchical Temporal Abstraction by Simultaneously Learning Expectations and Representations Katherine Metcalf, David Leake