Sehwag, Udari Madhushani
9 publications
ICLR
2026
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
ICLRW
2025
AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment Attacks
NeurIPSW
2024
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment