Pona, Edoardo

2 publications

NeurIPS 2025 Abstract Counterfactuals for Language Model Agents Edoardo Pona, Milad Kazemi, Yali Du, David Watson, Nicola Paoletti
NeurIPSW 2023 Reinforcement Learning Fine-Tuning of Language Models Is Biased Towards More Extractable Features Diogo Cruz, Edoardo Pona, Alex Holness-Tofts, Elias Schmied, VĂ­ctor Abia Alonso, Charlie Griffin, Bogdan-Ionut Cirstea