Associative Reinforcement Learning Using Linear Probabilistic Concepts

Abstract

We consider the problem of maximizing the total number of successes while learning about a probability function determining the likelihood of a success. In particular, we consider the case in which the probability function is represented by a linear function of the attribute vector associated with each action/choice. In the scenario we consider, learning proceeds in trials and in each trial, the algorithm is given a number of alternatives to choose from, each having an attribute vector associated with it, and for the alternative it selects it gets either a success or a failure with probability determined by applying a fixed but unknown linear success probability function to the attribute vector. Our algorithms consist of a learning method like the Widrow-Hoff rule and a probabilistic selection strategy which work together to resolve the so-called exploration-exploitation tradeoff. We analyze the performance of these methods by proving bounds on the worst-case regret, or how many less ...

Cite

Text

Abe and Long. "Associative Reinforcement Learning Using Linear Probabilistic Concepts." International Conference on Machine Learning, 1999.

Markdown

[Abe and Long. "Associative Reinforcement Learning Using Linear Probabilistic Concepts." International Conference on Machine Learning, 1999.](https://mlanthology.org/icml/1999/abe1999icml-associative/)

BibTeX

@inproceedings{abe1999icml-associative,
  title     = {{Associative Reinforcement Learning Using Linear Probabilistic Concepts}},
  author    = {Abe, Naoki and Long, Philip M.},
  booktitle = {International Conference on Machine Learning},
  year      = {1999},
  pages     = {3-11},
  url       = {https://mlanthology.org/icml/1999/abe1999icml-associative/}
}