Tan, Daniel

4 publications

ICLR 2026 Inoculation Prompting: Eliciting Traits from LLMs During Training Can Reduce Trait Expression at Test-Time Daniel Tan, Anders Cairns Woodruff, Niels Warncke, Arun Jose, Maxime Nicolas Riché, David Demitri Africa, Mia Taylor

ICLR 2026 Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives Chloe Li, Mary Phuong, Daniel Tan

TMLR 2025 Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models Jan Wehner, Sahar Abdelnabi, Daniel Tan, David Krueger, Mario Fritz

NeurIPS 2024 Analysing the Generalisation and Reliability of Steering Vectors Daniel Tan, David Chanin, Aengus Lynch, Brooks Paige, Dimitrios Kanoulas, Adrià Garriga-Alonso, Robert Kirk