ML Anthology
Authors
Search
About
Bewley, Tom
9 publications
ICLR
2025
Interpreting Language Reward Models via Contrastive Explanations
Junqi Jiang
,
Tom Bewley
,
Saumitra Mishra
,
Freddy Lecue
,
Manuela Veloso
NeurIPS
2025
Representation Consistency for Accurate and Coherent LLM Answer Aggregation
Junqi Jiang
,
Tom Bewley
,
Salim I. Amoukou
,
Francesco Leofante
,
Antonio Rago
,
Saumitra Mishra
,
Francesca Toni
ICML
2025
To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
Anna Hedström
,
Salim I. Amoukou
,
Tom Bewley
,
Saumitra Mishra
,
Manuela Veloso
ICML
2024
Counterfactual Metarules for Local and Global Recourse
Tom Bewley
,
Salim I. Amoukou
,
Saumitra Mishra
,
Daniele Magazzeni
,
Manuela Veloso
NeurIPS
2024
Sequential Harmful Shift Detection Without Labels
Salim I. Amoukou
,
Tom Bewley
,
Saumitra Mishra
,
Freddy Lecue
,
Daniele Magazzeni
,
Manuela Veloso
NeurIPS
2024
Zero-Shot Reinforcement Learning from Low Quality Data
Scott Jeen
,
Tom Bewley
,
Jonathan M. Cullen
NeurIPSW
2023
Conservative World Models
Scott Jeen
,
Tom Bewley
,
Jonathan Cullen
NeurIPS
2022
Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning
Joseph Early
,
Tom Bewley
,
Christine Evers
,
Sarvapali Ramchurn
AAAI
2021
TripleTree: A Versatile Interpretable Representation of Black Box Agents and Their Environments
Tom Bewley
,
Jonathan Lawry