Learning When to Stop Thinking and Do Something!
Abstract
An anytime algorithm is capable of returning a response to the given task at essentially any time; typically the quality of the response improves as the time increases. Here, we consider the challenge of learning when we should terminate such algorithms on each of a sequence of iid tasks, to optimize the expected average reward per unit time. We provide an algorithm for answering this question. We combine the global optimizer Cross Entropy method and the local gradient ascent, and theoretically investigate how far the estimated gradient is from the true gradient. We empirically demonstrate the applicability of the proposed algorithm on a toy problem, as well as on a real-world face detection task.
Cite
Text
Póczos et al. "Learning When to Stop Thinking and Do Something!." International Conference on Machine Learning, 2009. doi:10.1145/1553374.1553480Markdown
[Póczos et al. "Learning When to Stop Thinking and Do Something!." International Conference on Machine Learning, 2009.](https://mlanthology.org/icml/2009/poczos2009icml-learning/) doi:10.1145/1553374.1553480BibTeX
@inproceedings{poczos2009icml-learning,
title = {{Learning When to Stop Thinking and Do Something!}},
author = {Póczos, Barnabás and Abbasi-Yadkori, Yasin and Szepesvári, Csaba and Greiner, Russell and Sturtevant, Nathan R.},
booktitle = {International Conference on Machine Learning},
year = {2009},
pages = {825-832},
doi = {10.1145/1553374.1553480},
url = {https://mlanthology.org/icml/2009/poczos2009icml-learning/}
}