Partially Observable Reference Policy Programming

Abstract

This paper proposes Partially Observable Reference Policy Programming, a novel anytime online approximate POMDP solver which samples meaningful future histories very deeply while simultaneously forcing a gradual policy update. We provide theoretical guarantees for the algorithm’s underlying scheme which say that the performance loss is bounded by the average of the sampling approximation errors rather than the usual maximum; a crucial requirement given the sampling sparsity of online planning. Empirical evaluations on two large-scale problems with dynamically evolving environments—including a helicopter emergency scenario in the Corsica region requiring approximately 150 planning steps—corroborate the theoretical results and indicate that our solver considerably outperforms current online benchmarks.

Cite

Text

Kim and Kurniawati. "Partially Observable Reference Policy Programming." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/949

Markdown

[Kim and Kurniawati. "Partially Observable Reference Policy Programming." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/kim2025ijcai-partially/) doi:10.24963/IJCAI.2025/949

BibTeX

@inproceedings{kim2025ijcai-partially,
  title     = {{Partially Observable Reference Policy Programming}},
  author    = {Kim, Edward and Kurniawati, Hanna},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8536-8543},
  doi       = {10.24963/IJCAI.2025/949},
  url       = {https://mlanthology.org/ijcai/2025/kim2025ijcai-partially/}
}