Loosely Consistent Emphatic Temporal-Difference Learning
Abstract
There has been significant interest in searching for off-policy Temporal-Difference (TD) algorithms that find the same solution that would have been obtained in the on-policy regime. An important property of such algorithms is that their expected update has the same fixed point as that of On-policy TD($\lambda$), which we call loose consistency. Notably, Full-IS-TD($\lambda$) is the only existing loosely consistent method under general linear function approximation but, unfortunately, has a high variance and is scarcely practical. This notorious high variance issue motivates the introduction of ETD($\lambda$), which tames down the variance but has a biased fixed point. Inspired by these two methods, we propose a new loosely consistent algorithm called Average Emphatic TD (AETD($\lambda$)) with a transient bias, which strikes a balance between bias and variance. Further, we unify AETD($\lambda$) with existing methods and obtain a new family of loosely consistent algorithms called Loosely Consistent Emphatic TD (LC-ETD($\lambda$, $\beta$, $\nu$)), which can control a smooth bias-variance trade-off by varying the speed at which the transient bias fades. Through experiments on illustrative examples, we show the effectiveness and practicality of LC-ETD($\lambda$, $\beta$, $\nu$).
Cite
Text
He et al. "Loosely Consistent Emphatic Temporal-Difference Learning." Uncertainty in Artificial Intelligence, 2023.Markdown
[He et al. "Loosely Consistent Emphatic Temporal-Difference Learning." Uncertainty in Artificial Intelligence, 2023.](https://mlanthology.org/uai/2023/he2023uai-loosely/)BibTeX
@inproceedings{he2023uai-loosely,
title = {{Loosely Consistent Emphatic Temporal-Difference Learning}},
author = {He, Jiamin and Che, Fengdi and Wan, Yi and Mahmood, A. Rupam},
booktitle = {Uncertainty in Artificial Intelligence},
year = {2023},
pages = {849-859},
volume = {216},
url = {https://mlanthology.org/uai/2023/he2023uai-loosely/}
}