Gao, Zhaolin

8 publications

NeurIPS 2025 $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training Jin Peng Zhou, Kaiwen Wang, Jonathan Daniel Chang, Zhaolin Gao, Nathan Kallus, Kilian Q Weinberger, Kianté Brantley, Wen Sun
NeurIPS 2025 Accelerating RL for LLM Reasoning with Optimal Advantage Regression Kianté Brantley, Mingyu Chen, Zhaolin Gao, Jason D. Lee, Wen Sun, Wenhao Zhan, Xuezhou Zhang
NeurIPS 2025 Pre-Trained Large Language Models Learn to Predict Hidden Markov Models In-Context Yijia Dai, Zhaolin Gao, Yahya Sattar, Sarah Dean, Jennifer J. Sun
ICLR 2025 Regressing the Relative Future: Efficient Policy Optimization for Multi-Turn RLHF Zhaolin Gao, Wenhao Zhan, Jonathan Daniel Chang, Gokul Swamy, Kianté Brantley, Jason D. Lee, Wen Sun
NeurIPS 2025 Value-Guided Search for Efficient Chain-of-Thought Reasoning Kaiwen Wang, Jin Peng Zhou, Jonathan Daniel Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun
NeurIPS 2024 REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
ICMLW 2024 REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
ICMLW 2024 REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun