Associative Reinforcement Learning Using Linear Probabilistic Concepts
Abstract
We consider the problem of maximizing the total number of successes while learning about a probability function determining the likelihood of a success. In particular, we consider the case in which the probability function is represented by a linear function of the attribute vector associated with each action/choice. In the scenario we consider, learning proceeds in trials and in each trial, the algorithm is given a number of alternatives to choose from, each having an attribute vector associated with it, and for the alternative it selects it gets either a success or a failure with probability determined by applying a fixed but unknown linear success probability function to the attribute vector. Our algorithms consist of a learning method like the Widrow-Hoff rule and a probabilistic selection strategy which work together to resolve the so-called exploration-exploitation tradeoff. We analyze the performance of these methods by proving bounds on the worst-case regret, or how many less ...
Cite
Text
Abe and Long. "Associative Reinforcement Learning Using Linear Probabilistic Concepts." International Conference on Machine Learning, 1999.Markdown
[Abe and Long. "Associative Reinforcement Learning Using Linear Probabilistic Concepts." International Conference on Machine Learning, 1999.](https://mlanthology.org/icml/1999/abe1999icml-associative/)BibTeX
@inproceedings{abe1999icml-associative,
title = {{Associative Reinforcement Learning Using Linear Probabilistic Concepts}},
author = {Abe, Naoki and Long, Philip M.},
booktitle = {International Conference on Machine Learning},
year = {1999},
pages = {3-11},
url = {https://mlanthology.org/icml/1999/abe1999icml-associative/}
}