Approximate Solutions to Optimal Stopping Problems
Abstract
We propose and analyze an algorithm that approximates solutions to the problem of optimal stopping in a discounted irreducible ape(cid:173) riodic Markov chain. The scheme involves the use of linear com(cid:173) binations of fixed basis functions to approximate a Q-function. The weights of the linear combination are incrementally updated through an iterative process similar to Q-Iearning, involving sim(cid:173) ulation of the underlying Markov chain. Due to space limitations, we only provide an overview of a proof of convergence (with prob(cid:173) ability 1) and bounds on the approximation error. This is the first theoretical result that establishes the soundness of a Q-Iearning(cid:173) like algorithm when combined with arbitrary linear function ap(cid:173) proximators to solve a sequential decision problem. Though this paper focuses on the case of finite state spaces, the results extend naturally to continuous and unbounded state spaces, which are ad(cid:173) dressed in a forthcoming full-length paper.
Cite
Text
Tsitsiklis and Van Roy. "Approximate Solutions to Optimal Stopping Problems." Neural Information Processing Systems, 1996.Markdown
[Tsitsiklis and Van Roy. "Approximate Solutions to Optimal Stopping Problems." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/tsitsiklis1996neurips-approximate/)BibTeX
@inproceedings{tsitsiklis1996neurips-approximate,
title = {{Approximate Solutions to Optimal Stopping Problems}},
author = {Tsitsiklis, John N. and Van Roy, Benjamin},
booktitle = {Neural Information Processing Systems},
year = {1996},
pages = {1082-1088},
url = {https://mlanthology.org/neurips/1996/tsitsiklis1996neurips-approximate/}
}