Polynomial-Time Reinforcement Learning of Near-Optimal Policies
Abstract
Inspired by recent results on polynomial time reinforcement algorithms that accumulate near-optimal rewards, we look at the related problem of quickly learning near-optimal policies. The new problem is obviously related to the previous one, but different in important ways. We provide simple algorithms for MDPs, zero-sum and common-payoff Stochastic Games, and a uniform framework for proving their polynomial complexity. Unlike the previously studied problem, these bounds use the minimum between the mixing time and a new quantity - the spectral radius. Unlike the previous results, our results apply uniformly to the average and discounted cases.
Cite
Text
Pivazyan and Shoham. "Polynomial-Time Reinforcement Learning of Near-Optimal Policies." AAAI Conference on Artificial Intelligence, 2002. doi:10.5555/777092.777127Markdown
[Pivazyan and Shoham. "Polynomial-Time Reinforcement Learning of Near-Optimal Policies." AAAI Conference on Artificial Intelligence, 2002.](https://mlanthology.org/aaai/2002/pivazyan2002aaai-polynomial/) doi:10.5555/777092.777127BibTeX
@inproceedings{pivazyan2002aaai-polynomial,
title = {{Polynomial-Time Reinforcement Learning of Near-Optimal Policies}},
author = {Pivazyan, Karèn and Shoham, Yoav},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2002},
pages = {205-210},
doi = {10.5555/777092.777127},
url = {https://mlanthology.org/aaai/2002/pivazyan2002aaai-polynomial/}
}