On Shallow Planning Under Partial Observability

Abstract

Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (dis- counted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.

Cite

Text

Lefebvre and Durand. "On Shallow Planning Under Partial Observability." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I25.34860

Markdown

[Lefebvre and Durand. "On Shallow Planning Under Partial Observability." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/lefebvre2025aaai-shallow/) doi:10.1609/AAAI.V39I25.34860

BibTeX

@inproceedings{lefebvre2025aaai-shallow,
  title     = {{On Shallow Planning Under Partial Observability}},
  author    = {Lefebvre, Randy and Durand, Audrey},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {26587-26595},
  doi       = {10.1609/AAAI.V39I25.34860},
  url       = {https://mlanthology.org/aaai/2025/lefebvre2025aaai-shallow/}
}