The Steering Approach for Multi-Criteria Reinforcement Learning

NeurIPS 2001 pp. 1563-1570

/neurips/2001/mannor2001neurips-steering/

Abstract

We consider the problem of learning to attain multiple goals in a dynamic envi- ronment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm com- bines, in an appropriate way, a ﬂnite set of standard, scalar-reward learning algo- rithms. Su–cient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.

PDF NeurIPS Semantic Scholar

Cite

Text

Mannor and Shimkin. "The Steering Approach for Multi-Criteria Reinforcement Learning." Neural Information Processing Systems, 2001.

Markdown

[Mannor and Shimkin. "The Steering Approach for Multi-Criteria Reinforcement Learning." Neural Information Processing Systems, 2001.](https://mlanthology.org/neurips/2001/mannor2001neurips-steering/)

BibTeX

@inproceedings{mannor2001neurips-steering,
  title     = {{The Steering Approach for Multi-Criteria Reinforcement Learning}},
  author    = {Mannor, Shie and Shimkin, Nahum},
  booktitle = {Neural Information Processing Systems},
  year      = {2001},
  pages     = {1563-1570},
  url       = {https://mlanthology.org/neurips/2001/mannor2001neurips-steering/}
}