ML Anthology
Authors
Search
About
Zhan, Wenhao
20 publications
NeurIPS
2025
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley
,
Mingyu Chen
,
Zhaolin Gao
,
Jason D. Lee
,
Wen Sun
,
Wenhao Zhan
,
Xuezhou Zhang
ICLR
2025
Correcting the Mythos of KL-Regularization: Direct Alignment Without Overoptimization via Chi-Squared Preference Optimization
Audrey Huang
,
Wenhao Zhan
,
Tengyang Xie
,
Jason D. Lee
,
Wen Sun
,
Akshay Krishnamurthy
,
Dylan J Foster
ICLR
2025
Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
Wenhao Zhan
,
Scott Fujimoto
,
Zheqing Zhu
,
Jason D. Lee
,
Daniel Jiang
,
Yonathan Efroni
ICLR
2025
Regressing the Relative Future: Efficient Policy Optimization for Multi-Turn RLHF
Zhaolin Gao
,
Wenhao Zhan
,
Jonathan Daniel Chang
,
Gokul Swamy
,
Kianté Brantley
,
Jason D. Lee
,
Wen Sun
COLT
2024
Optimal Multi-Distribution Learning
Zihan Zhang
,
Wenhao Zhan
,
Yuxin Chen
,
Simon S Du
,
Jason D Lee
ICLR
2024
Provable Offline Preference-Based Reinforcement Learning
Wenhao Zhan
,
Masatoshi Uehara
,
Nathan Kallus
,
Jason D. Lee
,
Wen Sun
ICLR
2024
Provable Reward-Agnostic Preference-Based Reinforcement Learning
Wenhao Zhan
,
Masatoshi Uehara
,
Wen Sun
,
Jason D. Lee
ICLR
2024
Provably Efficient CVaR RL in Low-Rank MDPs
Yulai Zhao
,
Wenhao Zhan
,
Xiaoyan Hu
,
Ho-fung Leung
,
Farzan Farnia
,
Wen Sun
,
Jason D. Lee
NeurIPS
2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
,
Jonathan D. Chang
,
Wenhao Zhan
,
Owen Oertell
,
Gokul Swamy
,
Kianté Brantley
,
Thorsten Joachims
,
J. Andrew Bagnell
,
Jason D. Lee
,
Wen Sun
ICMLW
2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
,
Jonathan Daniel Chang
,
Wenhao Zhan
,
Owen Oertell
,
Gokul Swamy
,
Kianté Brantley
,
Thorsten Joachims
,
J. Andrew Bagnell
,
Jason D. Lee
,
Wen Sun
ICMLW
2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
,
Jonathan Daniel Chang
,
Wenhao Zhan
,
Owen Oertell
,
Gokul Swamy
,
Kianté Brantley
,
Thorsten Joachims
,
J. Andrew Bagnell
,
Jason D. Lee
,
Wen Sun
ICLR
2023
Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games
Wenhao Zhan
,
Jason D. Lee
,
Zhuoran Yang
ICMLW
2023
How to Query Human Feedback Efficiently in RL?
Wenhao Zhan
,
Masatoshi Uehara
,
Wen Sun
,
Jason D. Lee
ICMLW
2023
How to Query Human Feedback Efficiently in RL?
Wenhao Zhan
,
Masatoshi Uehara
,
Wen Sun
,
Jason D. Lee
ICLR
2023
PAC Reinforcement Learning for Predictive State Representations
Wenhao Zhan
,
Masatoshi Uehara
,
Wen Sun
,
Jason D. Lee
ICMLW
2023
Provable Offline Reinforcement Learning with Human Feedback
Wenhao Zhan
,
Masatoshi Uehara
,
Nathan Kallus
,
Jason D. Lee
,
Wen Sun
ICMLW
2023
Provable Offline Reinforcement Learning with Human Feedback
Wenhao Zhan
,
Masatoshi Uehara
,
Nathan Kallus
,
Jason D. Lee
,
Wen Sun
NeurIPSW
2023
Provably Efficient CVaR RL in Low-Rank MDPs
Yulai Zhao
,
Wenhao Zhan
,
Xiaoyan Hu
,
Ho-fung Leung
,
Farzan Farnia
,
Wen Sun
,
Jason Lee
NeurIPS
2023
Reward-Agnostic Fine-Tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning
Gen Li
,
Wenhao Zhan
,
Jason Lee
,
Yuejie Chi
,
Yuxin Chen
COLT
2022
Offline Reinforcement Learning with Realizability and Single-Policy Concentrability
Wenhao Zhan
,
Baihe Huang
,
Audrey Huang
,
Nan Jiang
,
Jason Lee