Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration

Abstract

Approximate value iteration methods for reinforcement learning (RL) generalize experience from limited samples across large state-action spaces. The function approximators used in such methods typically introduce errors in value estimation which can harm the quality of the learned value functions. We present a new batch-mode, off-policy, approximate value iteration algorithm called Trajectory Fitted Q-Iteration (TFQI). This approach uses the sequential relationship between samples within a trajectory, a set of samples gathered sequentially from the problem domain, to lessen the adverse influence of approximation errors while deriving long-term value. We provide a detailed description of the TFQI approach and an empirical study that analyzes the impact of our method on two well-known RL benchmarks. Our experiments demonstrate this approach has significant benefits including: better learned policy performance, improved convergence, and some decreased sensitivity to the choice of function approximation.

Cite

Text

Wright et al. "Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013. doi:10.1007/978-3-642-40988-2_8

Markdown

[Wright et al. "Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013.](https://mlanthology.org/ecmlpkdd/2013/wright2013ecmlpkdd-exploiting/) doi:10.1007/978-3-642-40988-2_8

BibTeX

@inproceedings{wright2013ecmlpkdd-exploiting,
  title     = {{Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration}},
  author    = {Wright, Robert William and Loscalzo, Steven and Dexter, Philip and Yu, Lei},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2013},
  pages     = {113-128},
  doi       = {10.1007/978-3-642-40988-2_8},
  url       = {https://mlanthology.org/ecmlpkdd/2013/wright2013ecmlpkdd-exploiting/}
}