Li, Yige
14 publications
ICLR
2026
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
CVPR
2025
Anyattack: Towards Large-Scale Self-Supervised Adversarial Attacks on Vision-Language Models
NeurIPS
2025
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models