On Convergence of Gradient Expected Sarsa(λ)

Abstract

We study the convergence of Expected Sarsa(λ) with function approximation. We show that with off-line es- timate (multi-step bootstrapping) to ExpectedSarsa(λ) is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a con- vergent Gradient Expected Sarsa(λ) (GES(λ)) algorithm. The theoretical analysis shows that the proposed GES(λ) converges to the optimal solution at a linear convergence rate under true gradient setting. Furthermore, we develop a Lyapunov function technique to investigate how the step- size influences finite-time performance of GES(λ). Addition- ally, such a technique of Lyapunov function can be poten- tially generalized to other gradient temporal difference algo- rithms. Finally, our experiments verify the effectiveness of our GES(λ). For the details of proof, please refer to https: //arxiv.org/pdf/2012.07199.pdf.

Cite

Text

Yang et al. "On Convergence of Gradient Expected Sarsa(λ)." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I12.17270

Markdown

[Yang et al. "On Convergence of Gradient Expected Sarsa(λ)." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/yang2021aaai-convergence/) doi:10.1609/AAAI.V35I12.17270

BibTeX

@inproceedings{yang2021aaai-convergence,
  title     = {{On Convergence of Gradient Expected Sarsa(λ)}},
  author    = {Yang, Long and Zheng, Gang and Zhang, Yu and Zheng, Qian and Li, Pengfei and Pan, Gang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {10621-10629},
  doi       = {10.1609/AAAI.V35I12.17270},
  url       = {https://mlanthology.org/aaai/2021/yang2021aaai-convergence/}
}