Bewley, Tom

9 publications

ICLR 2025 Interpreting Language Reward Models via Contrastive Explanations Junqi Jiang, Tom Bewley, Saumitra Mishra, Freddy Lecue, Manuela Veloso
NeurIPS 2025 Representation Consistency for Accurate and Coherent LLM Answer Aggregation Junqi Jiang, Tom Bewley, Salim I. Amoukou, Francesco Leofante, Antonio Rago, Saumitra Mishra, Francesca Toni
ICML 2025 To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models Anna Hedström, Salim I. Amoukou, Tom Bewley, Saumitra Mishra, Manuela Veloso
ICML 2024 Counterfactual Metarules for Local and Global Recourse Tom Bewley, Salim I. Amoukou, Saumitra Mishra, Daniele Magazzeni, Manuela Veloso
NeurIPS 2024 Sequential Harmful Shift Detection Without Labels Salim I. Amoukou, Tom Bewley, Saumitra Mishra, Freddy Lecue, Daniele Magazzeni, Manuela Veloso
NeurIPS 2024 Zero-Shot Reinforcement Learning from Low Quality Data Scott Jeen, Tom Bewley, Jonathan M. Cullen
NeurIPSW 2023 Conservative World Models Scott Jeen, Tom Bewley, Jonathan Cullen
NeurIPS 2022 Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning Joseph Early, Tom Bewley, Christine Evers, Sarvapali Ramchurn
AAAI 2021 TripleTree: A Versatile Interpretable Representation of Black Box Agents and Their Environments Tom Bewley, Jonathan Lawry