Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity

Mahmood, Ashique Rupam; Sutton, Richard S.

Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity

Ashique Rupam Mahmood, Richard S. Sutton

UAI 2015 pp. 552-561

/uai/2015/mahmood2015uai-off/

Abstract

Importance sampling is an essential component of model-free off-policy learning algorithms. Weighted importance sampling (WIS) is generally considered superior to ordinary importance sampling but, when combined with function approximation, it has hitherto required computational complexity that is $O(n^2)$ or more in the number of features. In this paper we introduce new off-policy learning algorithms that obtain most of the benefits of WIS with $O(n)$ computational complexity. Our algorithms maintain for each component of the parameter vector a measure of the extent to which that component has been used in previous examples. This measure is used to determine component-wise step sizes, merging the ideas of stochastic gradient descent and sample averages. We present our main WIS-based algorithm first in an intuitive acausal form (the forward view) and then derive a causal algorithm using eligibility traces that is equivalent but more efficient (the backward view). In three small experiments, our algorithms performed significantly better than prior $O(n)$ algorithms for off-policy policy evaluation. We also show that our adaptive step-size technique alone can improve the performance of on-policy algorithms such as TD\la and true online TD\la.

PDF UAI Semantic Scholar

Cite

Text

Mahmood and Sutton. "Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity." Conference on Uncertainty in Artificial Intelligence, 2015.

Markdown

[Mahmood and Sutton. "Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity." Conference on Uncertainty in Artificial Intelligence, 2015.](https://mlanthology.org/uai/2015/mahmood2015uai-off/)

BibTeX

@inproceedings{mahmood2015uai-off,
  title     = {{Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity}},
  author    = {Mahmood, Ashique Rupam and Sutton, Richard S.},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2015},
  pages     = {552-561},
  url       = {https://mlanthology.org/uai/2015/mahmood2015uai-off/}
}