Zhang, Shenao

17 publications

ICML 2025 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang
ICLRW 2025 Offline Reinforcement Learning for LLM Multi-Step Reasoning Huaijie Wang, Shibo Hao, Hanze Dong, Shenao Zhang, Yilin Bao, Ziran Yang, Yi Wu
ICML 2025 Reward-Augmented Data Enhances Direct Preference Alignment of LLMs Shenao Zhang, Zhihan Liu, Boyi Liu, Yufeng Zhang, Yingxiang Yang, Yongfei Liu, Liyu Chen, Tao Sun, Zhaoran Wang
ICLRW 2025 Reward-Augmented Data Enhances Direct Preference Alignment of LLMs Shenao Zhang, Zhihan Liu, Boyi Liu, Yufeng Zhang, Yingxiang Yang, Yongfei Liu, Liyu Chen, Tao Sun, Zhaoran Wang
TMLR 2025 Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Shenao Zhang, Donghan Yu, Hiteshi Sharma, Han Zhong, Zhihan Liu, Ziyi Yang, Shuohang Wang, Hany Hassan Awadalla, Zhaoran Wang
ICML 2024 Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations Feng Gao, Liangzhi Shi, Shenao Zhang, Zhaoran Wang, Yi Wu
NeurIPS 2024 Provably Mitigating Overoptimization in RLHF: Your SFT Loss Is Implicitly an Adversarial Regularizer Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang
ICMLW 2024 Provably Mitigating Overoptimization in RLHF: Your SFT Loss Is Implicitly an Adversarial Regularizer Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang
ICML 2024 Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang
ICMLW 2024 Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Shenao Zhang, Donghan Yu, Hiteshi Sharma, Ziyi Yang, Shuohang Wang, Hany Hassan Awadalla, Zhaoran Wang
ICML 2023 Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics Shenao Zhang, Wanxin Jin, Zhaoran Wang
CoLLAs 2023 Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning Shenao Zhang, Li Shen, Lei Han, Li Shen
NeurIPS 2023 Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang
NeurIPS 2023 Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms Shenao Zhang, Boyi Liu, Zhaoran Wang, Tuo Zhao
NeurIPSW 2023 Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang
NeurIPS 2022 Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning Shenao Zhang
ICLRW 2022 Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning Shenao Zhang, Li Shen, Lei Han, Li Shen