Least-Squares Temporal Difference with Expected Eligibility Traces

Abstract

Abstract Temporal Difference (TD) and Least-Squares Temporal Difference (LSTD) are related methods to estimate the value function of a Markov Decision Process (MDP). While TD is a direct method using local data to update the value function estimate, LSTD is a Bellman projected equation method using full data to compute a one-time estimate. TD( $\lambda $ ) and LSTD( $\lambda $ ) extend TD and LSTD with eligibility traces. While estimating the value function, TD( $\lambda $ ) and LSTD( $\lambda $ ) use actual histories of features as traces. Recently, expected eligibility traces have been proposed for TD( $\lambda $ ) to not only include actual histories, but also all potential histories of features that could have occurred based on the model or the available data. While this idea can account for non-linear feature architectures, here we limit ourselves to linear feature architectures with full data updates in the context of LSTD. We show that, in striking contrast with the direct versions, an extension of LSTD to include the theoretical expected eligibility traces is equivalent to LSTD without eligibility traces (LSTD(0)). We obtain a similar result if we consider mixed eligibility traces; a combination of expected eligibility traces and ordinary eligibility traces. In fact, we show that LSTD with theoretical mixed eligibility traces is equivalent to LSTD( $\lambda ^\prime $ ) for a given $\lambda ^\prime $ that captures both the decay of the eligibility trace, as well as the balance between the expected eligibility trace and the ordinary trace. Furthermore, we consider alternative methods LSET( $\lambda $ ) and LSET( $\eta $ , $\lambda $ ), which rely on the empirical means of the eligibility traces rather than the theoretical expected eligibility traces, and show that their value estimates converges to those of LSTD(0) and LSTD( $\lambda ^\prime $ ).

Cite

Text

van Zuijlen and Antunes. "Least-Squares Temporal Difference with Expected Eligibility Traces." Machine Learning, 2025. doi:10.1007/S10994-025-06912-Z

Markdown

[van Zuijlen and Antunes. "Least-Squares Temporal Difference with Expected Eligibility Traces." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/vanzuijlen2025mlj-leastsquares/) doi:10.1007/S10994-025-06912-Z

BibTeX

@article{vanzuijlen2025mlj-leastsquares,
  title     = {{Least-Squares Temporal Difference with Expected Eligibility Traces}},
  author    = {van Zuijlen, Roy and Antunes, Duarte},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {269},
  doi       = {10.1007/S10994-025-06912-Z},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/vanzuijlen2025mlj-leastsquares/}
}