Zhan, Wenhao

20 publications

NeurIPS 2025 Accelerating RL for LLM Reasoning with Optimal Advantage Regression Kianté Brantley, Mingyu Chen, Zhaolin Gao, Jason D. Lee, Wen Sun, Wenhao Zhan, Xuezhou Zhang
ICLR 2025 Correcting the Mythos of KL-Regularization: Direct Alignment Without Overoptimization via Chi-Squared Preference Optimization Audrey Huang, Wenhao Zhan, Tengyang Xie, Jason D. Lee, Wen Sun, Akshay Krishnamurthy, Dylan J Foster
ICLR 2025 Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank Wenhao Zhan, Scott Fujimoto, Zheqing Zhu, Jason D. Lee, Daniel Jiang, Yonathan Efroni
ICLR 2025 Regressing the Relative Future: Efficient Policy Optimization for Multi-Turn RLHF Zhaolin Gao, Wenhao Zhan, Jonathan Daniel Chang, Gokul Swamy, Kianté Brantley, Jason D. Lee, Wen Sun
COLT 2024 Optimal Multi-Distribution Learning Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S Du, Jason D Lee
ICLR 2024 Provable Offline Preference-Based Reinforcement Learning Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun
ICLR 2024 Provable Reward-Agnostic Preference-Based Reinforcement Learning Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
ICLR 2024 Provably Efficient CVaR RL in Low-Rank MDPs Yulai Zhao, Wenhao Zhan, Xiaoyan Hu, Ho-fung Leung, Farzan Farnia, Wen Sun, Jason D. Lee
NeurIPS 2024 REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
ICMLW 2024 REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
ICMLW 2024 REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
ICLR 2023 Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games Wenhao Zhan, Jason D. Lee, Zhuoran Yang
ICMLW 2023 How to Query Human Feedback Efficiently in RL? Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
ICMLW 2023 How to Query Human Feedback Efficiently in RL? Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
ICLR 2023 PAC Reinforcement Learning for Predictive State Representations Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
ICMLW 2023 Provable Offline Reinforcement Learning with Human Feedback Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun
ICMLW 2023 Provable Offline Reinforcement Learning with Human Feedback Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun
NeurIPSW 2023 Provably Efficient CVaR RL in Low-Rank MDPs Yulai Zhao, Wenhao Zhan, Xiaoyan Hu, Ho-fung Leung, Farzan Farnia, Wen Sun, Jason Lee
NeurIPS 2023 Reward-Agnostic Fine-Tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning Gen Li, Wenhao Zhan, Jason Lee, Yuejie Chi, Yuxin Chen
COLT 2022 Offline Reinforcement Learning with Realizability and Single-Policy Concentrability Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason Lee