Duenas, Derek

2 publications

ICLR 2025 AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, J Zico Kolter, Matt Fredrikson, Yarin Gal, Xander Davies
NeurIPS 2024 Improving Alignment and Robustness with Circuit Breakers Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks