Sun, Jun
28 publications
NeurIPS
2025
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
ICML
2025
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
ICML
2025
Position: Trustworthy AI Agents Require the Integration of Large Language Models and Formal Methods
NeurIPS
2024
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
CVPR
2023
Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection
IJCAI
2022
Learning Unforgotten Domain-Invariant Representations for Online Unsupervised Domain Adaptation
AISTATS
2020
Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation