On Learning History-Based Policies for Controlling Markov Decision Processes

Abstract

Reinforcement learning (RL) folklore suggests that history-based function approximation methods, such as recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.

Cite

Text

Patil et al. "On Learning History-Based Policies for Controlling Markov Decision Processes." ICML 2023 Workshops: Frontiers4LCD, 2023.

Markdown

[Patil et al. "On Learning History-Based Policies for Controlling Markov Decision Processes." ICML 2023 Workshops: Frontiers4LCD, 2023.](https://mlanthology.org/icmlw/2023/patil2023icmlw-learning/)

BibTeX

@inproceedings{patil2023icmlw-learning,
  title     = {{On Learning History-Based Policies for Controlling Markov Decision Processes}},
  author    = {Patil, Gandharv and Mahajan, Aditya and Precup, Doina},
  booktitle = {ICML 2023 Workshops: Frontiers4LCD},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/patil2023icmlw-learning/}
}