Deep Recurrent Optimal Stopping
Abstract
Deep neural networks (DNNs) have recently emerged as a powerful paradigm for solving Markovian optimal stopping problems. However, a ready extension of DNN-based methods to non-Markovian settings requires significant state and parameter space expansion, manifesting the curse of dimensionality. Further, efficient state-space transformations permitting Markovian approximations, such as those afforded by recurrent neural networks (RNNs), are either structurally infeasible or are confounded by the curse of non-Markovianity. Considering these issues, we introduce, for the first time, an optimal stopping policy gradient algorithm (OSPG) that can leverage RNNs effectively in non-Markovian settings by implicitly optimizing value functions without recursion, mitigating the curse of non-Markovianity. The OSPG algorithm is derived from an inference procedure on a novel Bayesian network representation of discrete-time non-Markovian optimal stopping trajectories and, as a consequence, yields an offline policy gradient algorithm that eliminates expensive Monte Carlo policy rollouts.
Cite
Text
Venkata and Bhattacharyya. "Deep Recurrent Optimal Stopping." Neural Information Processing Systems, 2023.Markdown
[Venkata and Bhattacharyya. "Deep Recurrent Optimal Stopping." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/venkata2023neurips-deep/)BibTeX
@inproceedings{venkata2023neurips-deep,
title = {{Deep Recurrent Optimal Stopping}},
author = {Venkata, Niranjan Damera and Bhattacharyya, Chiranjib},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/venkata2023neurips-deep/}
}