Learning Non-Markovian Decision-Making from State-Only Sequences
Abstract
Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from the assumptions of a standard Markov Decision Process (MDP). To address these challenges, we explore deep generative modeling of state-only sequences with non-Markov Decision Process (nMDP), where the policy is an energy-based prior in the latent space of the state transition generator. We develop maximum likelihood estimation to achieve model-based imitation, which involves short-run MCMC sampling from the prior and importance sampling for the posterior. The learned model enables $\textit{decision-making as inference}$: model-free policy execution is equivalent to prior sampling, model-based planning is posterior sampling initialized from the policy. We demonstrate the efficacy of the proposed method in a prototypical path planning task with non-Markovian constraints and show that the learned model exhibits strong performances in challenging domains from the MuJoCo suite.
Cite
Text
Qin et al. "Learning Non-Markovian Decision-Making from State-Only Sequences." Neural Information Processing Systems, 2023.Markdown
[Qin et al. "Learning Non-Markovian Decision-Making from State-Only Sequences." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/qin2023neurips-learning/)BibTeX
@inproceedings{qin2023neurips-learning,
title = {{Learning Non-Markovian Decision-Making from State-Only Sequences}},
author = {Qin, Aoyang and Gao, Feng and Li, Qing and Zhu, Song-Chun and Xie, Sirui},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/qin2023neurips-learning/}
}