Zhang, An
29 publications
ICLR
2026
AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning
NeurIPS
2025
Fading to Grow: Growing Preference Ratios via Preference Fading Discrete Diffusion for Recommendation
ICLR
2025
Fine-Grained Verifiers: Preference Modeling as Next-Token Prediction in Vision-Language Alignment
NeurIPS
2025
Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models