Ma, Hao-Xuan

2 publications

ICLR 2026 Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment Ruoxi Cheng, Hao-Xuan Ma, Weixin Wang, Ranjie Duan, Jiexi Liu, Xiaoshuang Jia, Simeng Qin, Xiaochun Cao, Yang Liu, Xiaojun Jia
NeurIPSW 2024 Reinforcement Learning from Multi-Role Debates as Feedback for Bias Mitigation in LLMs Ruoxi Cheng, Hao-Xuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang Wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo