On Shallow Planning Under Partial Observability

Lefebvre, Randy; Durand, Audrey

doi:10.1609/AAAI.V39I25.34860

On Shallow Planning Under Partial Observability

Randy Lefebvre, Audrey Durand

AAAI 2025 pp. 26587-26595

doi:10.1609/AAAI.V39I25.34860 /aaai/2025/lefebvre2025aaai-shallow/

Abstract

Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (dis- counted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.

PDF AAAI Semantic Scholar

Cite

Text

Lefebvre and Durand. "On Shallow Planning Under Partial Observability." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I25.34860

Markdown

[Lefebvre and Durand. "On Shallow Planning Under Partial Observability." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/lefebvre2025aaai-shallow/) doi:10.1609/AAAI.V39I25.34860

BibTeX

@inproceedings{lefebvre2025aaai-shallow,
  title     = {{On Shallow Planning Under Partial Observability}},
  author    = {Lefebvre, Randy and Durand, Audrey},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {26587-26595},
  doi       = {10.1609/AAAI.V39I25.34860},
  url       = {https://mlanthology.org/aaai/2025/lefebvre2025aaai-shallow/}
}