Borg, Jana Schaich
6 publications
NeurIPSW
2024
SafetyAnalyst: Interpretable, Transparent, and Steerable LLM Safety Moderation
Jing-Jing Li, Valentina Pyatkin, Max Kleiman-Weiner, Liwei Jiang, Nouha Dziri, Anne Collins, Jana Schaich Borg, Maarten Sap, Yejin Choi, Sydney Levine