Wang, Rowan

3 publications

ICLR 2025 Tamper-Resistant Safeguards for Open-Weight LLMs Rishub Tamirisa, Bhrugu Bharathi, Long Phan, Andy Zhou, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel, Andy Zou, Dawn Song, Bo Li, Dan Hendrycks, Mantas Mazeika
NeurIPS 2024 Improving Alignment and Robustness with Circuit Breakers Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks
ICLRW 2024 Preventing Memorized Completions Through White-Box Filtering Oam Patel, Rowan Wang