ML Anthology
Authors
Search
About
Jose, Arun
4 publications
ICLR
2026
Inoculation Prompting: Eliciting Traits from LLMs During Training Can Reduce Trait Expression at Test-Time
Daniel Tan
,
Anders Cairns Woodruff
,
Niels Warncke
,
Arun Jose
,
Maxime Nicolas Riché
,
David Demitri Africa
,
Mia Taylor
ICLR
2026
Strategic Obfuscation of Deceptive Reasoning in Language Models
Arun Jose
,
Niels Warncke
,
Mia Taylor
NeurIPS
2025
Reasoning Models Sometimes Output Illegible Chains of Thought
Arun Jose
NeurIPS
2025
Why Do Some Language Models Fake Alignment While Others Don't?
Abhay Sheshadri
,
John Hughes
,
Julian Michael
,
Alex Troy Mallen
,
Arun Jose
,
Fabien Roger