On Shallow Planning Under Partial Observability
Abstract
Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (dis- counted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.
Cite
Text
Lefebvre and Durand. "On Shallow Planning Under Partial Observability." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I25.34860Markdown
[Lefebvre and Durand. "On Shallow Planning Under Partial Observability." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/lefebvre2025aaai-shallow/) doi:10.1609/AAAI.V39I25.34860BibTeX
@inproceedings{lefebvre2025aaai-shallow,
title = {{On Shallow Planning Under Partial Observability}},
author = {Lefebvre, Randy and Durand, Audrey},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {26587-26595},
doi = {10.1609/AAAI.V39I25.34860},
url = {https://mlanthology.org/aaai/2025/lefebvre2025aaai-shallow/}
}