Rosenblatt, Judd

2 publications

ICLRW 2024 Rethinking Harmless Refusals When Fine-Tuning Foundation Models Florin Pop, Judd Rosenblatt, Diogo Schwerz de Lucena, Michael Vaiana
NeurIPSW 2024 Towards Safe and Honest AI Agents with Neural Self-Other Overlap Marc Carauleanu, Michael Vaiana, Judd Rosenblatt, Cameron Berg, Diogo S de Lucena