Periodic Agent-State Based Q-Learning for POMDPs
Abstract
The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy. Our main thesis that we illustrate via examples is that because the agent state does not satisfy the Markov property, non-stationary agent-state based policies can outperform stationary ones. To leverage this feature, we propose PASQL (periodic agent-state based Q-learning), which is a variant of agent-state-based Q-learning that learns periodic policies. By combining ideas from periodic Markov chains and stochastic approximation, we rigorously establish that PASQL converges to a cyclic limit and characterize the approximation error of the converged periodic policy. Finally, we present a numerical experiment to highlight the salient features of PASQL and demonstrate the benefit of learning periodic policies over stationary policies.
Cite
Text
Sinha et al. "Periodic Agent-State Based Q-Learning for POMDPs." Neural Information Processing Systems, 2024. doi:10.52202/079017-1985Markdown
[Sinha et al. "Periodic Agent-State Based Q-Learning for POMDPs." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/sinha2024neurips-periodic/) doi:10.52202/079017-1985BibTeX
@inproceedings{sinha2024neurips-periodic,
title = {{Periodic Agent-State Based Q-Learning for POMDPs}},
author = {Sinha, Amit and Geist, Matthieu and Mahajan, Aditya},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1985},
url = {https://mlanthology.org/neurips/2024/sinha2024neurips-periodic/}
}