Hu, Xuyang

2 publications

ICLR 2026 Diversity-Incentivized Exploration for Versatile Reasoning Zican Hu, Shilin Zhang, Yafu Li, Jianhao Yan, Xuyang Hu, Leyang Cui, Xiaoye Qu, Chunlin Chen, Yu Cheng, Zhi Wang
ICML 2025 Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Yafu Li, Xuyang Hu, Xiaoye Qu, Linjie Li, Yu Cheng