Liu, Zhihan

13 publications

ICML 2025 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang
ICML 2025 Reward-Augmented Data Enhances Direct Preference Alignment of LLMs Shenao Zhang, Zhihan Liu, Boyi Liu, Yufeng Zhang, Yingxiang Yang, Yongfei Liu, Liyu Chen, Tao Sun, Zhaoran Wang
ICLRW 2025 Reward-Augmented Data Enhances Direct Preference Alignment of LLMs Shenao Zhang, Zhihan Liu, Boyi Liu, Yufeng Zhang, Yingxiang Yang, Yongfei Liu, Liyu Chen, Tao Sun, Zhaoran Wang
TMLR 2025 Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Shenao Zhang, Donghan Yu, Hiteshi Sharma, Han Zhong, Zhihan Liu, Ziyi Yang, Shuohang Wang, Hany Hassan Awadalla, Zhaoran Wang
NeurIPS 2024 Provably Mitigating Overoptimization in RLHF: Your SFT Loss Is Implicitly an Adversarial Regularizer Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang
ICMLW 2024 Provably Mitigating Overoptimization in RLHF: Your SFT Loss Is Implicitly an Adversarial Regularizer Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang
ICML 2024 Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang
ICLR 2024 Sample-Efficient Multi-Agent RL: An Optimization Perspective Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang
ICLR 2023 Guarded Policy Optimization with Imperfect Online Demonstrations Zhenghai Xue, Zhenghao Peng, Quanyi Li, Zhihan Liu, Bolei Zhou
NeurIPS 2023 Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang
NeurIPSW 2023 Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang
ICML 2022 Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, Zhaoran Wang
ICML 2022 Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy Zhihan Liu, Miao Lu, Zhaoran Wang, Michael Jordan, Zhuoran Yang