Zhang, Shangtong

31 publications

IJCAI 2025 Counterfactual Explanations for Continuous Action Reinforcement Learning Shuyang Dong, Shangtong Zhang, Lu Feng
ICLR 2025 Doubly Optimal Policy Evaluation for Reinforcement Learning Shuze Liu, Claire Chen, Shangtong Zhang
AAAI 2025 Efficient Multi-Policy Evaluation for Reinforcement Learning Shuze Daniel Liu, Claire Chen, Shangtong Zhang
ICLR 2025 Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning Claire Chen, Shuze Liu, Shangtong Zhang
NeurIPS 2025 Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features Zixuan Xie, Xinyu Liu, Rohan Chandra, Shangtong Zhang
ICML 2025 Linear $q$-Learning Does Not Diverge in $l^2$: Convergence Rates to a Bounded Set Xinyu Liu, Zixuan Xie, Shangtong Zhang
ICLR 2025 Revisiting a Design Choice in Gradient Temporal Difference Learning Xiaochi Qian, Shangtong Zhang
JMLR 2025 The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang
NeurIPS 2025 Towards Provable Emergence of In-Context Reinforcement Learning Jiuqi Wang, Rohan Chandra, Shangtong Zhang
ICLR 2025 Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning Jiuqi Wang, Ethan Blaser, Hadi Daneshmand, Shangtong Zhang
ICML 2024 Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design Shuze Liu, Shangtong Zhang
ICMLW 2024 Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning Jiuqi Wang, Ethan H Blaser, Hadi Daneshmand, Shangtong Zhang
AAAI 2023 A New Challenge in Policy Evaluation Shangtong Zhang
ICML 2023 On the Convergence of SARSA with Linear Function Approximation Shangtong Zhang, Remi Tachet Des Combes, Romain Laroche
JMLR 2022 Global Optimality and Finite Sample Analysis of SoftMax Off-Policy Actor Critic Under State Distribution Mismatch Shangtong Zhang, Remi Tachet des Combes, Romain Laroche
AAAI 2022 Learning Expected Emphatic Traces for Deep RL Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt
JMLR 2022 Truncated Emphatic Temporal Difference Methods for Prediction and Control Shangtong Zhang, Shimon Whiteson
ICML 2021 Average-Reward Off-Policy Policy Evaluation with Function Approximation Shangtong Zhang, Yi Wan, Richard S Sutton, Shimon Whiteson
ICML 2021 Breaking the Deadly Triad with a Target Network Shangtong Zhang, Hengshuai Yao, Shimon Whiteson
IJCAI 2021 Deep Residual Reinforcement Learning (Extended Abstract) Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson
AAAI 2021 Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning Shangtong Zhang, Bo Liu, Shimon Whiteson
NeurIPSW 2021 StarCraft II Unplugged: Large Scale Offline Reinforcement Learning Michael Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Konrad Zolna, Richard Powell, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Kenji Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aaron van den Oord, Wojciech M. Czarnecki, Nando de Freitas, Oriol Vinyals
ICML 2020 GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values Shangtong Zhang, Bo Liu, Shimon Whiteson
NeurIPS 2020 Learning Retrospective Knowledge with Reverse Reinforcement Learning Shangtong Zhang, Vivek Veeriah, Shimon Whiteson
AAAI 2020 Mega-Reward: Achieving Human-Level Play Without Extrinsic Rewards Yuhang Song, Jianyi Wang, Thomas Lukasiewicz, Zhenghua Xu, Shangtong Zhang, Andrzej Wojcicki, Mai Xu
ICML 2020 Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson
AAAI 2019 ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search Shangtong Zhang, Hengshuai Yao
NeurIPS 2019 DAC: The Double Actor-Critic Architecture for Learning Options Shangtong Zhang, Shimon Whiteson
NeurIPS 2019 Generalized Off-Policy Actor-Critic Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson
AAAI 2019 QUOTA: The Quantile Option Architecture for Reinforcement Learning Shangtong Zhang, Hengshuai Yao
ECML-PKDD 2017 Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks Vivek Veeriah, Shangtong Zhang, Richard S. Sutton