An Average-Reward Reinforcement Learning Algorithm for Computing Bias-Optimal Policies

Mahadevan, Sridhar

An Average-Reward Reinforcement Learning Algorithm for Computing Bias-Optimal Policies

AAAI 1996 pp. 875-880

/aaai/1996/mahadevan1996aaai-average/

Abstract

Average-reward reinforcement learning (ARL) is an undiscounted optimality framework that is generally applicable to a broad range of control tasks. ARL computes gain-optimal control policies that maximize the expected payoff per step. However, gainoptimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish between different policies that all reach an absorbing goal state, but incur varying costs. A more selective criterion is bias optimality, which can filter gain-optimal policies to select those that reach absorbing goals with the minimum cost. While several ARL algorithms for computing gain-optimal policies have been proposed, none of these algorithms can guarantee bias optimality, since this requires solving at least two nested optimality equations. In this paper, we describe a novel model-based ARL algorithm for computing bias-optimal policies. We test the proposed algorithm using an admission control queuing system, and show that it is able to utilize the queue much more efficiently than a gain-optimal method by learning bias-optimal policies.

PDF AAAI Semantic Scholar

Cite

Text

Mahadevan. "An Average-Reward Reinforcement Learning Algorithm for Computing Bias-Optimal Policies." AAAI Conference on Artificial Intelligence, 1996.

Markdown

[Mahadevan. "An Average-Reward Reinforcement Learning Algorithm for Computing Bias-Optimal Policies." AAAI Conference on Artificial Intelligence, 1996.](https://mlanthology.org/aaai/1996/mahadevan1996aaai-average/)

BibTeX

@inproceedings{mahadevan1996aaai-average,
  title     = {{An Average-Reward Reinforcement Learning Algorithm for Computing Bias-Optimal Policies}},
  author    = {Mahadevan, Sridhar},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1996},
  pages     = {875-880},
  url       = {https://mlanthology.org/aaai/1996/mahadevan1996aaai-average/}
}