Sehwag, Udari Madhushani

9 publications

ICLR 2026 MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes Yu Ying Chiu, Michael S. Lee, Rachel Calcott, Brandon Handoko, Paul de Font-Reaulx, Paula Rodriguez, Chen Bo Calvin Zhang, Ziwen Han, Udari Madhushani Sehwag, Yash Maurya, Christina Q Knight, Harry R. Lloyd, Florence Bacus, Mantas Mazeika, Bing Liu, Yejin Choi, Mitchell L Gordon, Sydney Levine
ICLR 2026 PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach Udari Madhushani Sehwag, Shayan Shabihi, Alex McAvoy, Vikash Sehwag, Yuancheng Xu, Dalton Towers, Furong Huang
ICLRW 2025 AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment Attacks Pankayaraj Pathmanathan, Udari Madhushani Sehwag, Michael-Andrei Panaitescu-Liess, Furong Huang
ICLR 2025 Collab: Controlled Decoding Using Mixture of Agents for LLM Alignment Souradip Chakraborty, Sujay Bhatt, Udari Madhushani Sehwag, Soumya Suvra Ghosal, Jiahao Qiu, Mengdi Wang, Dinesh Manocha, Furong Huang, Alec Koppel, Sumitra Ganesh
ICLR 2025 GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment Yuancheng Xu, Udari Madhushani Sehwag, Alec Koppel, Sicheng Zhu, Bang An, Furong Huang, Sumitra Ganesh
ICLR 2025 SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal
NeurIPSW 2024 AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment Pankayaraj Pathmanathan, Udari Madhushani Sehwag, Michael-Andrei Panaitescu-Liess, Furong Huang
ICMLW 2024 In-Context Learning with Topological Information for LLM-Based Knowledge Graph Completion Udari Madhushani Sehwag, Kassiani Papasotiriou, Jared Vann, Sumitra Ganesh
NeurIPSW 2024 Policy Dreamer: Diverse Public Policy Generation via Elicitation and Simulation of Human Preferences Arjun Karanam, José Ramón Enríquez, Udari Madhushani Sehwag, Michael Elabd, Kanishk Gandhi, Noah Goodman, Sanmi Koyejo