Tan, Daniel

4 publications

ICLR 2026 Inoculation Prompting: Eliciting Traits from LLMs During Training Can Reduce Trait Expression at Test-Time Daniel Tan, Anders Cairns Woodruff, Niels Warncke, Arun Jose, Maxime Nicolas Riché, David Demitri Africa, Mia Taylor
ICLR 2026 Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives Chloe Li, Mary Phuong, Daniel Tan
TMLR 2025 Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models Jan Wehner, Sahar Abdelnabi, Daniel Tan, David Krueger, Mario Fritz
NeurIPS 2024 Analysing the Generalisation and Reliability of Steering Vectors Daniel Tan, David Chanin, Aengus Lynch, Brooks Paige, Dimitrios Kanoulas, Adrià Garriga-Alonso, Robert Kirk