Lee, Bruce W.

3 publications

NeurIPS 2025 Distillation Robustifies Unlearning Bruce W. Lee, Addie Foote, Alex Infanger, Leni Shor, Harish K Kamath, Jacob Goldman-Wetzler, Bryce Woodworth, Alex Cloud, Alexander Matt Turner
ICLR 2025 Programming Refusal with Conditional Activation Steering Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Erik Miehling, Pierre Dognin, Manish Nagireddy, Amit Dhurandhar
NeurIPS 2025 Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs Mantas Mazeika, Xuwang Yin, Rishub Tamirisa, Jaehyuk Lim, Bruce W. Lee, Richard Ren, Long Phan, Norman Mu, Oliver Zhang, Dan Hendrycks