Dobre, David
9 publications
ICLR
2025
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
NeurIPSW
2024
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
ICML
2024
Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features