Arditi, Andy

4 publications

TMLR 2025 Inverse Scaling in Test-Time Compute Aryo Pradipta Gema, Alexander Hägele, Runjin Chen, Andy Arditi, Jacob Goldman-Wetzler, Kit Fraser-Taliente, Henry Sleight, Linda Petrini, Julian Michael, Beatrice Alex, Pasquale Minervini, Yanda Chen, Joe Benton, Ethan Perez
NeurIPS 2025 Structural Causal Bandits Under Markov Equivalence Min Woo Park, Andy Arditi, Elias Bareinboim, Sanghack Lee
NeurIPS 2024 Refusal in Language Models Is Mediated by a Single Direction Andy Arditi, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, Neel Nanda
ICMLW 2024 Refusal in Language Models Is Mediated by a Single Direction Andy Arditi, Oscar Balcells Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, Neel Nanda