Reinforcement Learning for Trading

Abstract

We propose to train trading systems by optimizing financial objec(cid:173) tive functions via reinforcement learning. The performance func(cid:173) tions that we consider are profit or wealth, the Sharpe ratio and our recently proposed differential Sharpe ratio for online learn(cid:173) ing. In Moody & Wu (1997), we presented empirical results that demonstrate the advantages of reinforcement learning relative to supervised learning. Here we extend our previous work to com(cid:173) pare Q-Learning to our Recurrent Reinforcement Learning (RRL) algorithm. We provide new simulation results that demonstrate the presence of predictability in the monthly S&P 500 Stock Index for the 25 year period 1970 through 1994, as well as a sensitivity analysis that provides economic insight into the trader's structure.

Cite

Text

Moody and Saffell. "Reinforcement Learning for Trading." Neural Information Processing Systems, 1998.

Markdown

[Moody and Saffell. "Reinforcement Learning for Trading." Neural Information Processing Systems, 1998.](https://mlanthology.org/neurips/1998/moody1998neurips-reinforcement/)

BibTeX

@inproceedings{moody1998neurips-reinforcement,
  title     = {{Reinforcement Learning for Trading}},
  author    = {Moody, John E. and Saffell, Matthew},
  booktitle = {Neural Information Processing Systems},
  year      = {1998},
  pages     = {917-923},
  url       = {https://mlanthology.org/neurips/1998/moody1998neurips-reinforcement/}
}