Dziri, Nouha

25 publications

ICLR 2025 AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text Against Web Text Ximing Lu, Melanie Sclar, Skyler Hallinan, Niloofar Mireshghallah, Jiacheng Liu, Seungju Han, Allyson Ettinger, Liwei Jiang, Khyathi Chandu, Nouha Dziri, Yejin Choi
NeurIPS 2025 Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) Liwei Jiang, Yuanjun Chai, Margaret Li, Mickel Liu, Raymond Fok, Nouha Dziri, Yulia Tsvetkov, Maarten Sap, Yejin Choi
TMLR 2025 Multi-Attribute Constraint Satisfaction via Language Model Rewriting Ashutosh Baheti, Debanjana Chakraborty, Faeze Brahman, Ronan Le Bras, Ximing Lu, Nouha Dziri, Yejin Choi, Mark Riedl, Maarten Sap
NeurIPS 2025 OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization Yiyou Sun, Shawn Hu, Georgia Zhou, Ken Jiankun Zheng, Hannaneh Hajishirzi, Nouha Dziri, Dawn Song
ICML 2025 SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior Jing-Jing Li, Valentina Pyatkin, Max Kleiman-Weiner, Liwei Jiang, Nouha Dziri, Anne Collins, Jana Schaich Borg, Maarten Sap, Yejin Choi, Sydney Levine
ICLR 2025 Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction Jarrid Rector-Brooks, Mohsin Hasan, Zhangzhi Peng, Cheng-Hao Liu, Sarthak Mittal, Nouha Dziri, Michael M. Bronstein, Pranam Chatterjee, Alexander Tong, Joey Bose
AAAI 2025 To Err Is AI: A Case Study Informing LLM Flaw Reporting Practices Sean McGregor, Allyson Ettinger, Nick Judd, Paul Albee, Liwei Jiang, Kavel Rao, William H. Smith, Shayne Longpre, Avijit Ghosh, Christopher Fiorelli, Michelle Hoang, Sven Cattell, Nouha Dziri
NeurIPS 2025 Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations Yiyou Sun, Yu Gai, Lijie Chen, Abhilasha Ravichander, Yejin Choi, Nouha Dziri, Dawn Song
ICLR 2025 WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi
ICLR 2024 Phenomenal yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren
ICML 2024 Position: A Roadmap to Pluralistic Alignment Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell L Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, Yejin Choi
NeurIPSW 2024 SafetyAnalyst: Interpretable, Transparent, and Steerable LLM Safety Moderation Jing-Jing Li, Valentina Pyatkin, Max Kleiman-Weiner, Liwei Jiang, Nouha Dziri, Anne Collins, Jana Schaich Borg, Maarten Sap, Yejin Choi, Sydney Levine
NeurIPS 2024 The Art of Saying No: Contextual Noncompliance in Language Models Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi
ICLR 2024 The Generative AI Paradox: “What It Can Create, It May Not Understand” Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi
ICLR 2024 The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Yejin Choi
AAAI 2024 Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties Taylor Sorensen, Liwei Jiang, Jena D. Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Yejin Choi
NeurIPS 2024 WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, Nouha Dziri
NeurIPS 2024 WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri
ICMLW 2024 WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Nouha Dziri, Yejin Choi
ICCV 2023 CHAMPAGNE: Learning Real-World Conversation from Large-Scale Web Videos Seungju Han, Jack Hessel, Nouha Dziri, Yejin Choi, Youngjae Yu
NeurIPS 2023 Faith and Fate: Limits of Transformers on Compositionality Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena Hwang, Soumya Sanyal, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi
NeurIPS 2023 Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
NeurIPS 2023 Self-Refine: Iterative Refinement with Self-Feedback Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark
NeurIPSW 2023 URIAL: Tuning-Free Instruction Learning and Alignment for Untuned LLMs Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Yejin Choi
ICML 2021 Decomposed Mutual Information Estimation for Contrastive Representation Learning Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Philip Bachman, Remi Tachet Des Combes