ML Anthology
Authors
Search
About
Gao, Zhaolin
8 publications
NeurIPS
2025
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Peng Zhou
,
Kaiwen Wang
,
Jonathan Daniel Chang
,
Zhaolin Gao
,
Nathan Kallus
,
Kilian Q Weinberger
,
Kianté Brantley
,
Wen Sun
NeurIPS
2025
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley
,
Mingyu Chen
,
Zhaolin Gao
,
Jason D. Lee
,
Wen Sun
,
Wenhao Zhan
,
Xuezhou Zhang
NeurIPS
2025
Pre-Trained Large Language Models Learn to Predict Hidden Markov Models In-Context
Yijia Dai
,
Zhaolin Gao
,
Yahya Sattar
,
Sarah Dean
,
Jennifer J. Sun
ICLR
2025
Regressing the Relative Future: Efficient Policy Optimization for Multi-Turn RLHF
Zhaolin Gao
,
Wenhao Zhan
,
Jonathan Daniel Chang
,
Gokul Swamy
,
Kianté Brantley
,
Jason D. Lee
,
Wen Sun
NeurIPS
2025
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang
,
Jin Peng Zhou
,
Jonathan Daniel Chang
,
Zhaolin Gao
,
Nathan Kallus
,
Kianté Brantley
,
Wen Sun
NeurIPS
2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
,
Jonathan D. Chang
,
Wenhao Zhan
,
Owen Oertell
,
Gokul Swamy
,
Kianté Brantley
,
Thorsten Joachims
,
J. Andrew Bagnell
,
Jason D. Lee
,
Wen Sun
ICMLW
2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
,
Jonathan Daniel Chang
,
Wenhao Zhan
,
Owen Oertell
,
Gokul Swamy
,
Kianté Brantley
,
Thorsten Joachims
,
J. Andrew Bagnell
,
Jason D. Lee
,
Wen Sun
ICMLW
2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
,
Jonathan Daniel Chang
,
Wenhao Zhan
,
Owen Oertell
,
Gokul Swamy
,
Kianté Brantley
,
Thorsten Joachims
,
J. Andrew Bagnell
,
Jason D. Lee
,
Wen Sun