On Learning History-Based Policies for Controlling Markov Decision Processes
Abstract
Reinforcement learning (RL) folklore suggests that methods of function approximation based on history, such as recurrent neural networks or state abstractions that include past information, outperform those without memory, because function approximation in Markov decision processes (MDP) can lead to a scenario akin to dealing with a partially observable MDP (POMDP). However, formal analysis of history-based algorithms has been limited, with most existing frameworks concentrating on features without historical context. In this paper, we introduce a theoretical framework to examine the behaviour of RL algorithms that control an MDP using feature abstraction mappings based on historical data. Additionally, we leverage this framework to develop a practical RL algorithm and assess its performance across various continuous control tasks.
Cite
Text
Patil et al. "On Learning History-Based Policies for Controlling Markov Decision Processes." Artificial Intelligence and Statistics, 2024.Markdown
[Patil et al. "On Learning History-Based Policies for Controlling Markov Decision Processes." Artificial Intelligence and Statistics, 2024.](https://mlanthology.org/aistats/2024/patil2024aistats-learning/)BibTeX
@inproceedings{patil2024aistats-learning,
title = {{On Learning History-Based Policies for Controlling Markov Decision Processes}},
author = {Patil, Gandharv and Mahajan, Aditya and Precup, Doina},
booktitle = {Artificial Intelligence and Statistics},
year = {2024},
pages = {3511-3519},
volume = {238},
url = {https://mlanthology.org/aistats/2024/patil2024aistats-learning/}
}