Zhang, Xuezhou

31 publications

NeurIPS 2025 Accelerating RL for LLM Reasoning with Optimal Advantage Regression Kianté Brantley, Mingyu Chen, Zhaolin Gao, Jason D. Lee, Wen Sun, Wenhao Zhan, Xuezhou Zhang
NeurIPS 2025 Avoiding exp(R) Scaling in RLHF Through Preference-Based Exploration Mingyu Chen, Yiding Chen, Wen Sun, Xuezhou Zhang
AAAI 2025 Efficient Reinforcement Learning in Probabilistic Reward Machines Xiaofeng Lin, Xuezhou Zhang
AAAI 2024 Exact Policy Recovery in Offline RL with Both Heavy-Tailed Rewards and Data Corruption Yiding Chen, Xuezhou Zhang, Qiaomin Xie, Xiaojin Zhu
ICMLW 2024 Improved Algorithms for Adversarial Bandits with Unbounded Losses Mingyu Chen, Xuezhou Zhang
COLT 2024 Scale-Free Adversarial Reinforcement Learning Mingyu Chen, Xuezhou Zhang
NeurIPS 2024 State-Free Reinforcement Learning Mingyu Chen, Aldo Pacchiano, Xuezhou Zhang
AISTATS 2023 Byzantine-Robust Online and Offline Distributed Reinforcement Learning Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu
NeurIPS 2023 Learning Adversarial Low-Rank Markov Decision Processes with Unknown Transition and Full-Information Feedback Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li
COLT 2023 Provable Benefits of Representational Transfer in Reinforcement Learning Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang
ICML 2023 Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang
ICLR 2023 Representation Learning for Low-Rank General-Sum Markov Games Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Zihan Ding, Chi Jin, Mengdi Wang
AISTATS 2022 Corruption-Robust Offline Reinforcement Learning Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun
NeurIPS 2022 Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvari, Mengdi Wang
NeurIPS 2022 Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks Shuoguang Yang, Xuezhou Zhang, Mengdi Wang
ICML 2022 Efficient Reinforcement Learning in Block MDPs: A Model-Free Representation Learning Approach Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
ICML 2022 Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang
ICML 2022 Optimal Estimation of Policy Gradient via Double Fitted Iteration Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang
NeurIPSW 2022 Provable Benefits of Representational Transfer in Reinforcement Learning Alekh Agarwal, Yuda Song, Kaiwen Wang, Mengdi Wang, Wen Sun, Xuezhou Zhang
NeurIPS 2022 Provable Defense Against Backdoor Policies in Reinforcement Learning Shubham Bharti, Xuezhou Zhang, Adish Singla, Xiaojin Zhu
ICLR 2022 Representation Learning for Online and Offline RL in Low-Rank MDPs Masatoshi Uehara, Xuezhou Zhang, Wen Sun
NeurIPS 2021 Neural Additive Models: Interpretable Machine Learning with Neural Nets Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, Geoffrey E. Hinton
ICML 2021 Robust Policy Gradient Against Strong Data Corruption Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun
AAAI 2021 The Sample Complexity of Teaching by Reinforcement on Q-Learning Xuezhou Zhang, Shubham Kumar Bharti, Yuzhe Ma, Adish Singla, Xiaojin Zhu
ICML 2020 Adaptive Reward-Poisoning Attacks Against Reinforcement Learning Xuezhou Zhang, Yuzhe Ma, Adish Singla, Xiaojin Zhu
L4DC 2020 Online Data Poisoning Attacks Xuezhou Zhang, Xiaojin Zhu, Laurent Lessard
NeurIPS 2020 Task-Agnostic Exploration in Reinforcement Learning Xuezhou Zhang, Yuzhe Ma, Adish Singla
AISTATS 2019 An Optimal Control Approach to Sequential Machine Teaching Laurent Lessard, Xuezhou Zhang, Xiaojin Zhu
NeurIPS 2019 Policy Poisoning in Batch Reinforcement Learning and Control Yuzhe Ma, Xuezhou Zhang, Wen Sun, Xiaojin Zhu
AISTATS 2018 Teacher Improves Learning by Selecting a Training Subset Yuzhe Ma, Robert Nowak, Philippe Rigollet, Xuezhou Zhang, Xiaojin Zhu
AAAI 2018 Training Set Debugging Using Trusted Items Xuezhou Zhang, Xiaojin Zhu, Stephen J. Wright