Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity
Abstract
Importance sampling is an essential component of model-free off-policy learning algorithms. Weighted importance sampling (WIS) is generally considered superior to ordinary importance sampling but, when combined with function approximation, it has hitherto required computational complexity that is $O(n^2)$ or more in the number of features. In this paper we introduce new off-policy learning algorithms that obtain most of the benefits of WIS with $O(n)$ computational complexity. Our algorithms maintain for each component of the parameter vector a measure of the extent to which that component has been used in previous examples. This measure is used to determine component-wise step sizes, merging the ideas of stochastic gradient descent and sample averages. We present our main WIS-based algorithm first in an intuitive acausal form (the forward view) and then derive a causal algorithm using eligibility traces that is equivalent but more efficient (the backward view). In three small experiments, our algorithms performed significantly better than prior $O(n)$ algorithms for off-policy policy evaluation. We also show that our adaptive step-size technique alone can improve the performance of on-policy algorithms such as TD\la and true online TD\la.
Cite
Text
Mahmood and Sutton. "Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity." Conference on Uncertainty in Artificial Intelligence, 2015.Markdown
[Mahmood and Sutton. "Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity." Conference on Uncertainty in Artificial Intelligence, 2015.](https://mlanthology.org/uai/2015/mahmood2015uai-off/)BibTeX
@inproceedings{mahmood2015uai-off,
title = {{Off-Policy Learning Based on Weighted Importance Sampling with Linear Computational Complexity}},
author = {Mahmood, Ashique Rupam and Sutton, Richard S.},
booktitle = {Conference on Uncertainty in Artificial Intelligence},
year = {2015},
pages = {552-561},
url = {https://mlanthology.org/uai/2015/mahmood2015uai-off/}
}