Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value

Abstract

We propose a new class of computationally fast algorithms to find close to optimal policy for Markov Decision Processes (MDP) with large finite horizon T.The main idea is that instead of planning until the time horizon T, we plan only up to a truncated horizon H << T and use an estimate of the true optimal value function as the terminal value. Our approach of finding the terminal value function is to learn a mapping from an MDP to its value function by solving many similar MDPs during a training phase and fit a regression estimator. We analyze the method by providing an error propagation theorem that shows the effect of various sources of errors to the quality of the solution. We also empirically validate this approach in a real-world application of designing an energy management system for Hybrid Electric Vehicles with promising results.

Cite

Text

Farahmand et al. "Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value." AAAI Conference on Artificial Intelligence, 2016. doi:10.1609/AAAI.V30I1.10397

Markdown

[Farahmand et al. "Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value." AAAI Conference on Artificial Intelligence, 2016.](https://mlanthology.org/aaai/2016/farahmand2016aaai-truncated/) doi:10.1609/AAAI.V30I1.10397

BibTeX

@inproceedings{farahmand2016aaai-truncated,
  title     = {{Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value}},
  author    = {Farahmand, Amir-massoud and Nikovski, Daniel Nikolaev and Igarashi, Yuji and Konaka, Hiroki},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2016},
  pages     = {3123-3129},
  doi       = {10.1609/AAAI.V30I1.10397},
  url       = {https://mlanthology.org/aaai/2016/farahmand2016aaai-truncated/}
}