Xu, Yuancheng
20 publications
ICLR
2026
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
AAAI
2025
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
ICML
2025
PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model
ICMLW
2024
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
ICMLW
2024
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
NeurIPSW
2024
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
NeurIPS
2023
C-Disentanglement: Discovering Causally-Independent Generative Factors Under an Inductive Bias of Confounder