Sutton, Richard S
68 publications
NeurIPSW
2022
On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly Communicating MDPs
UAI
2018
Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return
ECML-PKDD
2017
Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks
UAI
2015
Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity
NeurIPS
2014
Weighted Importance Sampling for Off-Policy Learning with Linear Function Approximation
ICML
2009
Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation
NeurIPS
2008
A Convergent $O(n)$ Temporal-Difference Algorithm for Off-Policy Learning with Linear Function Approximation
NeurIPS
1995
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding