Singh, Satinder P.

53 publications

NeurIPS 2023 A Definition of Continual Reinforcement Learning David Abel, Andre Barreto, Benjamin Van Roy, Doina Precup, Hado P van Hasselt, Satinder P. Singh
NeurIPS 2023 Combining Behaviors with the Successor Features Keyboard Wilka Carvalho Carvalho, Andre Saraiva, Angelos Filos, Andrew Lampinen, Loic Matthey, Richard L Lewis, Honglak Lee, Satinder P. Singh, Danilo Jimenez Rezende, Daniel Zoran
NeurIPS 2023 Large Language Models Can Implement Policy Iteration Ethan Brooks, Logan Walls, Richard L Lewis, Satinder P. Singh
NeurIPS 2023 Optimistic Meta-Gradients Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado P van Hasselt, András György, Satinder P. Singh
NeurIPS 2023 Structured State Space Models for In-Context Reinforcement Learning Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder P. Singh, Feryal Behbahani
NeurIPS 2022 Approximate Value Equivalence Christopher Grimm, Andre Barreto, Satinder P. Singh
NeurIPS 2022 PaLM up: Playing in the Latent Manifold for Unsupervised Pretraining Hao Liu, Tom Zahavy, Volodymyr Mnih, Satinder P. Singh
NeurIPS 2022 Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction Dilip Arumugam, Satinder P. Singh
NeurIPS 2021 Discovery of Options via Meta-Learned Subgoals Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado P van Hasselt, David Silver, Satinder P. Singh
NeurIPS 2021 Learning State Representations from Random Deep Action-Conditional Predictions Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard L Lewis, Satinder P. Singh
NeurIPS 2021 On the Expressivity of Markov Reward David Abel, Will Dabney, Anna Harutyunyan, Mark K Ho, Michael L. Littman, Doina Precup, Satinder P. Singh
NeurIPS 2021 Proper Value Equivalence Christopher Grimm, Andre Barreto, Greg Farquhar, David Silver, Satinder P. Singh
NeurIPS 2021 Reward Is Enough for Convex MDPs Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder P. Singh
NeurIPS 2020 A Self-Tuning Actor-Critic Algorithm Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado P van Hasselt, David Silver, Satinder P. Singh
NeurIPS 2020 Discovering Reinforcement Learning Algorithms Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado P van Hasselt, Satinder P. Singh, David Silver
NeurIPS 2020 Learning to Play No-Press Diplomacy with Best Response Policy Iteration Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas Hudson, Nicolas Porcel, Marc Lanctot, Julien Perolat, Richard Everett, Satinder P. Singh, Thore Graepel, Yoram Bachrach
NeurIPS 2020 Meta-Gradient Reinforcement Learning with an Objective Discovered Online Zhongwen Xu, Hado P van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder P. Singh, David Silver
NeurIPS 2020 On Efficiency in Hierarchical Reinforcement Learning Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder P. Singh
NeurIPS 2020 The Value Equivalence Principle for Model-Based Reinforcement Learning Christopher Grimm, Andre Barreto, Satinder P. Singh, David Silver
NeurIPS 2010 Reward Design via Online Gradient Ascent Jonathan Sorg, Richard L. Lewis, Satinder P. Singh
NeurIPS 2008 Simple Local Models for Complex Dynamical Systems Erik Talvitie, Satinder P. Singh
NeurIPS 2005 Off-Policy Learning with Options and Recognizers Doina Precup, Cosmin Paduraru, Anna Koop, Richard S. Sutton, Satinder P. Singh
NeurIPS 2004 Approximately Efficient Online Mechanism Design David C. Parkes, Dimah Yanovsky, Satinder P. Singh
NeurIPS 2004 Intrinsically Motivated Reinforcement Learning Nuttapong Chentanez, Andrew G. Barto, Satinder P. Singh
NeurIPS 2003 A Nonlinear Predictive State Representation Matthew R. Rudary, Satinder P. Singh
NeurIPS 2003 An MDP-Based Approach to Online Mechanism Design David C. Parkes, Satinder P. Singh
NeurIPS 2001 An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games Michael L. Littman, Michael J. Kearns, Satinder P. Singh
NeurIPS 1999 Policy Gradient Methods for Reinforcement Learning with Function Approximation Richard S. Sutton, David A. McAllester, Satinder P. Singh, Yishay Mansour
NeurIPS 1999 Reinforcement Learning for Spoken Dialogue Systems Satinder P. Singh, Michael J. Kearns, Diane J. Litman, Marilyn A. Walker
NeurIPS 1998 Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes John K. Williams, Satinder P. Singh
NeurIPS 1998 Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms Michael J. Kearns, Satinder P. Singh
NeurIPS 1998 Improved Switching Among Temporally Abstract Actions Richard S. Sutton, Satinder P. Singh, Doina Precup, Balaraman Ravindran
NeurIPS 1998 Optimizing Admission Control While Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning Timothy X. Brown, Hui Tong, Satinder P. Singh
NeurIPS 1997 How to Dynamically Merge Markov Decision Processes Satinder P. Singh, David Cohn
NeurIPS 1996 Analytical Mean Squared Error Curves in Temporal Difference Learning Satinder P. Singh, Peter Dayan
COLT 1996 Learning Curve Bounds for a Markov Decision Process with Undiscounted Rewards Lawrence K. Saul, Satinder P. Singh
NeurIPS 1996 Predicting Lifetimes in Dynamically Allocated Memory David A. Cohn, Satinder P. Singh
NeurIPS 1996 Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems Satinder P. Singh, Dimitri P. Bertsekas
MLJ 1996 Reinforcement Learning with Replacing Eligibility Traces Satinder P. Singh, Richard S. Sutton
NeurIPS 1995 Improving Policies Without Measuring Merits Peter Dayan, Satinder P. Singh
COLT 1995 Markov Decision Processes in Large State Spaces Lawrence K. Saul, Satinder P. Singh
MLJ 1994 An Upper Bound on the Loss from Approximate Optimal-Value Functions Satinder P. Singh, Richard C. Yee
ICML 1994 Learning Without State-Estimation in Partially Observable Markovian Decision Processes Satinder P. Singh, Tommi S. Jaakkola, Michael I. Jordan
NeCo 1994 On the Convergence of Stochastic Iterative Dynamic Programming Algorithms Tommi S. Jaakkola, Michael I. Jordan, Satinder P. Singh
NeurIPS 1994 Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems Tommi Jaakkola, Satinder P. Singh, Michael I. Jordan
AAAI 1994 Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes Satinder P. Singh
NeurIPS 1994 Reinforcement Learning with Soft State Aggregation Satinder P. Singh, Tommi Jaakkola, Michael I. Jordan
NeurIPS 1993 Convergence of Stochastic Iterative Dynamic Programming Algorithms Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh
NeurIPS 1993 Robust Reinforcement Learning in Motion Planning Satinder P. Singh, Andrew G. Barto, Roderic Grupen, Christopher Connolly
AAAI 1992 Reinforcement Learning with a Hierarchy of Abstract Models Satinder P. Singh
ICML 1992 Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models Satinder P. Singh
NeurIPS 1991 The Efficient Learning of Multiple Task Sequences Satinder P. Singh
ICML 1991 Transfer of Learning Across Compositions of Sequentail Tasks Satinder P. Singh