Ferreira, Pedro Lobato

1 publications

ICLR 2026 Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations Pedro Lobato Ferreira, Wilker Aziz, Ivan Titov