Structure Learning for Safe Policy Improvement
Abstract
We investigate how Safe Policy Improvement (SPI) algorithms can exploit the structure of factored Markov decision processes when such structure is unknown a priori. To facilitate the application of reinforcement learning in the real world, SPI provides probabilistic guarantees that policy changes in a running process will improve the performance of this process. However, current SPI algorithms have requirements that might be impractical, such as: (i) availability of a large amount of historical data, or (ii) prior knowledge of the underlying structure. To overcome these limitations we enhance a Factored SPI (FSPI) algorithm with different structure learning methods. The resulting algorithms need fewer samples to improve the policy and require weaker prior knowledge assumptions. In well-factorized domains, the proposed algorithms improve performance significantly compared to a flat SPI algorithm, demonstrating a sample complexity closer to an FSPI algorithm that knows the structure. This indicates that the combination of FSPI and structure learning algorithms is a promising solution to real-world problems involving many variables.
Cite
Text
Simão and Spaan. "Structure Learning for Safe Policy Improvement." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/479Markdown
[Simão and Spaan. "Structure Learning for Safe Policy Improvement." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/simao2019ijcai-structure/) doi:10.24963/IJCAI.2019/479BibTeX
@inproceedings{simao2019ijcai-structure,
title = {{Structure Learning for Safe Policy Improvement}},
author = {Simão, Thiago D. and Spaan, Matthijs T. J.},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2019},
pages = {3453-3459},
doi = {10.24963/IJCAI.2019/479},
url = {https://mlanthology.org/ijcai/2019/simao2019ijcai-structure/}
}