Chang, Jonathan Daniel

8 publications

NeurIPS 2025 $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training Jin Peng Zhou, Kaiwen Wang, Jonathan Daniel Chang, Zhaolin Gao, Nathan Kallus, Kilian Q Weinberger, Kianté Brantley, Wen Sun
ICLR 2025 Regressing the Relative Future: Efficient Policy Optimization for Multi-Turn RLHF Zhaolin Gao, Wenhao Zhan, Jonathan Daniel Chang, Gokul Swamy, Kianté Brantley, Jason D. Lee, Wen Sun
NeurIPS 2025 Value-Guided Search for Efficient Chain-of-Thought Reasoning Kaiwen Wang, Jin Peng Zhou, Jonathan Daniel Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun
ICLR 2024 Adversarial Imitation Learning via Boosting Jonathan Daniel Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun
NeurIPSW 2024 Critique-Out-Loud Reward Models Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan Daniel Chang, Prithviraj Ammanabrolu
ICMLW 2024 REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
ICMLW 2024 REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
NeurIPSW 2023 Policy-Gradient Training of Language Models for Ranking Ge Gao, Jonathan Daniel Chang, Claire Cardie, Kianté Brantley, Thorsten Joachims