ML Anthology
Authors
Search
About
Ferreira, Pedro Lobato
1 publications
ICLR
2026
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Pedro Lobato Ferreira
,
Wilker Aziz
,
Ivan Titov