Khoja, Adam
1 publications
NeurIPS
2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren, Steven Basart, Adam Khoja, Alexander Pan, Alice Gatti, Long Phan, Xuwang Yin, Mantas Mazeika, Gabriel Mukobi, Ryan Hwang Kim, Stephen Fitz, Dan Hendrycks