Sutton, Richard S

68 publications

ICML 2025 MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-Parameters Arsalan Sharifnassab, Saber Salehkaleybar, Richard S. Sutton
ICMLW 2024 Reward Centering Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton
AAAI 2024 Reward-Respecting Subtasks for Model-Based Reinforcement Learning (Abstract Reprint) Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White
JMLR 2023 Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks Khurram Javed, Haseeb Shah, Richard S. Sutton, Martha White
ICML 2023 Toward Efficient Gradient-Based Value Estimation Arsalan Sharifnassab, Richard S. Sutton
CoLLAs 2023 Value-Aware Importance Weighting for Off-Policy Reinforcement Learning Kristopher De Asis, Eric Graves, Richard S. Sutton
NeurIPS 2022 Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions Tian Tian, Kenny Young, Richard S. Sutton
NeurIPSW 2022 On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly Communicating MDPs Yi Wan, Richard S. Sutton
ICML 2021 Average-Reward Off-Policy Policy Evaluation with Function Approximation Shangtong Zhang, Yi Wan, Richard S Sutton, Shimon Whiteson
ICML 2021 Learning and Planning in Average-Reward Markov Decision Processes Yi Wan, Abhishek Naik, Richard S Sutton
AAAI 2020 Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves
IJCAI 2019 Planning with Expectation Models Yi Wan, Muhammad Zaheer, Adam White, Martha White, Richard S. Sutton
UAI 2018 Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return Craig Sherstan, Dylan R. Ashley, Brendan Bennett, Kenny Young, Adam White, Martha White, Richard S. Sutton
AAAI 2018 Multi-Step Reinforcement Learning: A Unifying Algorithm Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton
JMLR 2018 On Generalized Bellman Equations and Temporal-Difference Learning Huizhen Yu, A. Rupam Mahmood, Richard S. Sutton
UAI 2018 Per-Decision Multi-Step Temporal Difference Learning with Control Variates Kristopher De Asis, Richard S. Sutton
ECML-PKDD 2017 Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks Vivek Veeriah, Shangtong Zhang, Richard S. Sutton
JMLR 2016 An Emphatic Approach to the Problem of Off-Policy Temporal-Difference Learning Richard S. Sutton, A. Rupam Mahmood, Martha White
JMLR 2016 True Online Temporal-Difference Learning Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton
UAI 2015 Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity Ashique Rupam Mahmood, Richard S. Sutton
UAI 2014 Off-Policy TD( L) with a True Online Equivalence Hado van Hasselt, Ashique Rupam Mahmood, Richard S. Sutton
NeurIPS 2014 Universal Option Models Hengshuai Yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar
NeurIPS 2014 Weighted Importance Sampling for Off-Policy Learning with Linear Function Approximation A. Rupam Mahmood, Hado P van Hasselt, Richard S. Sutton
ICML 2012 Linear Off-Policy Actor-Critic Thomas Degris, Martha White, Richard S. Sutton
MLJ 2012 Temporal-Difference Search in Computer Go David Silver, Richard S. Sutton, Martin Müller
ICML 2010 Toward Off-Policy Learning Control with Function Approximation Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Richard S. Sutton
NeurIPS 2009 Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation Hamid R. Maei, Csaba Szepesvári, Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton
ICML 2009 Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora
NeurIPS 2009 Multi-Step Dyna Planning for Policy Evaluation and Control Hengshuai Yao, Shalabh Bhatnagar, Dongcui Diao, Richard S. Sutton, Csaba Szepesvári
NeurIPS 2008 A Computational Model of Hippocampal Function in Trace Conditioning Elliot A. Ludvig, Richard S. Sutton, Eric Verbeek, E. J. Kehoe
NeurIPS 2008 A Convergent $O(n)$ Temporal-Difference Algorithm for Off-Policy Learning with Linear Function Approximation Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári
UAI 2008 Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, Michael H. Bowling
ICML 2008 Sample-Based Learning and Search with Permanent and Transient Memories David Silver, Richard S. Sutton, Martin Müller
NeurIPS 2007 Incremental Natural Actor-Critic Algorithms Shalabh Bhatnagar, Mohammad Ghavamzadeh, Mark Lee, Richard S. Sutton
ICML 2007 On the Role of Tracking in Stationary Environments Richard S. Sutton, Anna Koop, David Silver
IJCAI 2007 Reinforcement Learning of Local Shape in the Game of Go David Silver, Richard S. Sutton, Martin Müller
AAAI 2006 Incremental Least-Squares Temporal Difference Learning Alborz Geramifard, Michael H. Bowling, Richard S. Sutton
NeurIPS 2006 iLSTD: Eligibility Traces and Convergence Analysis Alborz Geramifard, Michael Bowling, Martin Zinkevich, Richard S. Sutton
NeurIPS 2005 Off-Policy Learning with Options and Recognizers Doina Precup, Cosmin Paduraru, Anna Koop, Richard S. Sutton, Satinder P. Singh
ICML 2005 TD(lambda) Networks: Temporal-Difference Networks with Eligibility Traces Brian Tanner, Richard S. Sutton
NeurIPS 2005 Temporal Abstraction in Temporal-Difference Networks Eddie Rafols, Anna Koop, Richard S. Sutton
IJCAI 2005 Temporal-Difference Networks with History Brian Tanner, Richard S. Sutton
IJCAI 2005 Using Predictive Representations to Improve Generalization in Reinforcement Learning Eddie J. Rafols, Mark B. Ring, Richard S. Sutton, Brian Tanner
NeurIPS 2004 Temporal-Difference Networks Richard S. Sutton, Brian Tanner
AAAI 2002 Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence, July 28 - August 1, 2002, Edmonton, Alberta, Canada Rina Dechter, Michael J. Kearns, Richard S. Sutton
ICML 2001 Off-Policy Temporal Difference Learning with Function Approximation Doina Precup, Richard S. Sutton, Sanjoy Dasgupta
NeurIPS 2001 Predictive Representations of State Michael L. Littman, Richard S. Sutton
ICML 2001 Scaling Reinforcement Learning Toward RoboCup Soccer Peter Stone, Richard S. Sutton
ICML 2000 Eligibility Traces for Off-Policy Policy Evaluation Doina Precup, Richard S. Sutton, Satinder Singh
NeurIPS 1999 Policy Gradient Methods for Reinforcement Learning with Function Approximation Richard S. Sutton, David A. McAllester, Satinder P. Singh, Yishay Mansour
NeurIPS 1998 Improved Switching Among Temporally Abstract Actions Richard S. Sutton, Satinder P. Singh, Doina Precup, Balaraman Ravindran
ICML 1998 Intra-Option Learning About Temporally Abstract Actions Richard S. Sutton, Doina Precup, Satinder Singh
NeurIPS 1998 Learning Instance-Independent Value Functions to Enhance Local Search Robert Moll, Andrew G. Barto, Theodore J. Perkins, Richard S. Sutton
ECML-PKDD 1998 Theoretical Results on Reinforcement Learning with Temporally Abstract Options Doina Precup, Richard S. Sutton, Satinder Singh
ICML 1997 Exponentiated Gradient Methods for Reinforcement Learning Doina Precup, Richard S. Sutton
NeurIPS 1997 Multi-Time Models for Temporally Abstract Planning Doina Precup, Richard S. Sutton
MLJ 1996 Reinforcement Learning with Replacing Eligibility Traces Satinder P. Singh, Richard S. Sutton
NeurIPS 1995 Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding Richard S. Sutton
ICML 1995 TD Models: Modeling the World at a Mixture of Time Scales Richard S. Sutton
ICML 1993 Online Learning with Random Representations Richard S. Sutton, Steven D. Whitehead
AAAI 1992 Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta Richard S. Sutton
NeurIPS 1991 Iterative Construction of Sparse Polynomial Approximations Terence D. Sanger, Richard S. Sutton, Christopher J. Matheus
ICML 1991 Learning Polynomial Functions by Feature Construction Richard S. Sutton, Christopher J. Matheus
ICML 1991 Planning by Incremental Dynamic Programming Richard S. Sutton
ICML 1990 Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming Richard S. Sutton
NeurIPS 1990 Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming Richard S. Sutton
MLJ 1988 Learning to Predict by the Methods of Temporal Differences Richard S. Sutton
IJCAI 1985 Training and Tracking in Robotics Oliver G. Selfridge, Richard S. Sutton, Andrew G. Barto