Dong, Junhao
20 publications
ICLR
2026
Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
ICLR
2026
OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
NeurIPS
2025
Robust SuperAlignment: Weak-to-Strong Robustness Generalization for Vision-Language Models
CVPR
2023
The Enemy of My Enemy Is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training