Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients

Abstract

In this paper, we introduce a novel form of value function, $Q(s, s’)$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s’$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at http://sites.google.com/view/qss-paper.

Cite

Text

Edwards et al. "Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients." International Conference on Machine Learning, 2020.

Markdown

[Edwards et al. "Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/edwards2020icml-estimating/)

BibTeX

@inproceedings{edwards2020icml-estimating,
  title     = {{Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients}},
  author    = {Edwards, Ashley and Sahni, Himanshu and Liu, Rosanne and Hung, Jane and Jain, Ankit and Wang, Rui and Ecoffet, Adrien and Miconi, Thomas and Isbell, Charles and Yosinski, Jason},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {2825-2835},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/edwards2020icml-estimating/}
}