Optimizing Average Reward Using Discounted Rewards

Kakade, Sham M.

doi:10.1007/3-540-44581-1_40

Optimizing Average Reward Using Discounted Rewards

Sham M. Kakade

COLT 2001 pp. 605-615

doi:10.1007/3-540-44581-1_40 /colt/2001/kakade2001colt-optimizing/

Abstract

In many reinforcement learningproblems, it is appropriate to optimize the average reward. In practice, this is often done by solving the Bellman equations usinga discount factor close to 1. In this paper, we provide a bound on the average reward of the policy obtained by solving the Bellman equations which depends on the relationship between the discount factor and the mixingtime of the Markov chain. We extend this result to the direct policy gradient of Baxter and Bartlett, in which a discount parameter is used to find a biased estimate of the gradient of the average reward with respect to the parameters of a policy. We show that this biased gradient is an exact gradient of a related discounted problem and provide a bound on the optima found by followingthese biased gradients of the average reward. Further, we show that the exact Hessian in this related discounted problem is an approximate Hessian of the average reward, with equality in the limit the discount factor tends to 1. We then provide an algorithm to estimate the Hessian from a sample path of the underlyingMark ov chain, which converges with probability 1.

PDF COLT Semantic Scholar

Cite

Text

Kakade. "Optimizing Average Reward Using Discounted Rewards." Annual Conference on Computational Learning Theory, 2001. doi:10.1007/3-540-44581-1_40

Markdown

[Kakade. "Optimizing Average Reward Using Discounted Rewards." Annual Conference on Computational Learning Theory, 2001.](https://mlanthology.org/colt/2001/kakade2001colt-optimizing/) doi:10.1007/3-540-44581-1_40

BibTeX

@inproceedings{kakade2001colt-optimizing,
  title     = {{Optimizing Average Reward Using Discounted Rewards}},
  author    = {Kakade, Sham M.},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2001},
  pages     = {605-615},
  doi       = {10.1007/3-540-44581-1_40},
  url       = {https://mlanthology.org/colt/2001/kakade2001colt-optimizing/}
}