Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates
Abstract
We present a new, efficient PAC optimal exploration algorithm that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic settings, but even when instant value function updates are possible, our bounds present a significant improvement over previous single MDP exploration bounds, and a drastic improvement over previous concurrent PAC bounds. We also present TCE, a new, fine grained metric for the cost of exploration.
Cite
Text
Pazis and Parr. "Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates." AAAI Conference on Artificial Intelligence, 2016. doi:10.1609/AAAI.V30I1.10307Markdown
[Pazis and Parr. "Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates." AAAI Conference on Artificial Intelligence, 2016.](https://mlanthology.org/aaai/2016/pazis2016aaai-efficient/) doi:10.1609/AAAI.V30I1.10307BibTeX
@inproceedings{pazis2016aaai-efficient,
title = {{Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates}},
author = {Pazis, Jason and Parr, Ronald},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2016},
pages = {1977-1985},
doi = {10.1609/AAAI.V30I1.10307},
url = {https://mlanthology.org/aaai/2016/pazis2016aaai-efficient/}
}