Deep Recurrent Optimal Stopping

Abstract

Deep neural networks (DNNs) have recently emerged as a powerful paradigm for solving Markovian optimal stopping problems. However, a ready extension of DNN-based methods to non-Markovian settings requires significant state and parameter space expansion, manifesting the curse of dimensionality. Further, efficient state-space transformations permitting Markovian approximations, such as those afforded by recurrent neural networks (RNNs), are either structurally infeasible or are confounded by the curse of non-Markovianity. Considering these issues, we introduce, for the first time, an optimal stopping policy gradient algorithm (OSPG) that can leverage RNNs effectively in non-Markovian settings by implicitly optimizing value functions without recursion, mitigating the curse of non-Markovianity. The OSPG algorithm is derived from an inference procedure on a novel Bayesian network representation of discrete-time non-Markovian optimal stopping trajectories and, as a consequence, yields an offline policy gradient algorithm that eliminates expensive Monte Carlo policy rollouts.

Cite

Text

Venkata and Bhattacharyya. "Deep Recurrent Optimal Stopping." Neural Information Processing Systems, 2023.

Markdown

[Venkata and Bhattacharyya. "Deep Recurrent Optimal Stopping." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/venkata2023neurips-deep/)

BibTeX

@inproceedings{venkata2023neurips-deep,
  title     = {{Deep Recurrent Optimal Stopping}},
  author    = {Venkata, Niranjan Damera and Bhattacharyya, Chiranjib},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/venkata2023neurips-deep/}
}