Mean-Field Theory for Batched-TD(l)

Pineda, Fernando J.

doi:10.1162/NECO.1997.9.7.1403

Mean-Field Theory for Batched-TD(l)

Fernando J. Pineda

NeCo 1997 pp. 1403-1419

doi:10.1162/NECO.1997.9.7.1403 /neco/1997/pineda1997neco-meanfield/

Abstract

A representation-independent mean-field dynamics is presented for batched TD(λ). The task is learning to predict the outcome of an indirectly observed absorbing Markov process. In the case of linear representations, the discrete-time deterministic iteration is an affine map whose fixed point can be expressed in closed form without the assumption of linearly independent observation vectors. Batched linear TD(λ) is proved to converge with probability 1 for all λ. Theory and simulation agree on a random walk example.

NeCo Semantic Scholar

Cite

Text

Pineda. "Mean-Field Theory for Batched-TD(l)." Neural Computation, 1997. doi:10.1162/NECO.1997.9.7.1403

Markdown

[Pineda. "Mean-Field Theory for Batched-TD(l)." Neural Computation, 1997.](https://mlanthology.org/neco/1997/pineda1997neco-meanfield/) doi:10.1162/NECO.1997.9.7.1403

BibTeX

@article{pineda1997neco-meanfield,
  title     = {{Mean-Field Theory for Batched-TD(l)}},
  author    = {Pineda, Fernando J.},
  journal   = {Neural Computation},
  year      = {1997},
  pages     = {1403-1419},
  doi       = {10.1162/NECO.1997.9.7.1403},
  volume    = {9},
  url       = {https://mlanthology.org/neco/1997/pineda1997neco-meanfield/}
}