ML Anthology
Authors
Search
About
Ward, Francis Rhys
9 publications
ICLR
2025
AI Sandbagging: Language Models Can Strategically Underperform on Evaluations
Teun van der Weij
,
Felix Hofstätter
,
Oliver Jaffe
,
Samuel F. Brown
,
Francis Rhys Ward
NeurIPS
2025
CTRL-ALT-DECEIT Sabotage Evaluations for Automated AI R&D
Francis Rhys Ward
,
Teun van der Weij
,
Hanna Gábor
,
Sam Martin
,
Raja Mehta Moreno
,
Harel Lidar
,
Louis Makower
,
Thomas Jodrell
,
Lauren Robson
ICML
2025
The Elicitation Game: Evaluating Capability Elicitation Techniques
Felix Hofstätter
,
Teun Van Der Weij
,
Jayden Teoh
,
Rada Djoneva
,
Henning Bartsch
,
Francis Rhys Ward
AAAI
2025
Towards a Theory of AI Personhood
Francis Rhys Ward
NeurIPSW
2024
AI Sandbagging: Language Models Can Selectively Underperform on Evaluations
Teun van der Weij
,
Felix Hofstätter
,
Oliver Jaffe
,
Samuel F. Brown
,
Francis Rhys Ward
NeurIPSW
2024
The Elicitation Game: Stress-Testing Capability Elicitation Techniques
Felix Hofstätter
,
Jayden Teoh
,
Teun van der Weij
,
Francis Rhys Ward
NeurIPSW
2024
Towards a Theory of AI Personhood
Francis Rhys Ward
NeurIPSW
2024
Towards a Theory of AI Personhood
Francis Rhys Ward
NeurIPSW
2022
Towards Defining Deception in Structural Causal Games
Francis Rhys Ward