Sehwag, Udari Madhushani

7 publications

ICLRW 2025 AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment Attacks Pankayaraj Pathmanathan, Udari Madhushani Sehwag, Michael-Andrei Panaitescu-Liess, Furong Huang
ICLR 2025 Collab: Controlled Decoding Using Mixture of Agents for LLM Alignment Souradip Chakraborty, Sujay Bhatt, Udari Madhushani Sehwag, Soumya Suvra Ghosal, Jiahao Qiu, Mengdi Wang, Dinesh Manocha, Furong Huang, Alec Koppel, Sumitra Ganesh
ICLR 2025 GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment Yuancheng Xu, Udari Madhushani Sehwag, Alec Koppel, Sicheng Zhu, Bang An, Furong Huang, Sumitra Ganesh
ICLR 2025 SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal
NeurIPSW 2024 AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment Pankayaraj Pathmanathan, Udari Madhushani Sehwag, Michael-Andrei Panaitescu-Liess, Furong Huang
ICMLW 2024 In-Context Learning with Topological Information for LLM-Based Knowledge Graph Completion Udari Madhushani Sehwag, Kassiani Papasotiriou, Jared Vann, Sumitra Ganesh
NeurIPSW 2024 Policy Dreamer: Diverse Public Policy Generation via Elicitation and Simulation of Human Preferences Arjun Karanam, José Ramón Enríquez, Udari Madhushani Sehwag, Michael Elabd, Kanishk Gandhi, Noah Goodman, Sanmi Koyejo