ML Anthology
Authors
Search
About
Vaiana, Michael
2 publications
ICLRW
2024
Rethinking Harmless Refusals When Fine-Tuning Foundation Models
Florin Pop
,
Judd Rosenblatt
,
Diogo Schwerz de Lucena
,
Michael Vaiana
NeurIPSW
2024
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Marc Carauleanu
,
Michael Vaiana
,
Judd Rosenblatt
,
Cameron Berg
,
Diogo S de Lucena