Expediting Reinforcement Learning by Incorporating Temporal Causal Information

Abstract

Reinforcement learning (RL) algorithms struggle with learning optimal policies for tasks where reward feedback is sparse and depends on a complex sequence of events in the environment. Probabilistic reward machines (PRMs) are finite-state formalisms that can capture temporal dependencies in the reward signal, along with nondeterministic task outcomes. While special RL algorithms can exploit this finite-state structure to expedite learning, PRMs remain difficult to modify and design by hand. This hinders the already difficult tasks of utilizing high-level causal knowledge about the environment, and transferring the reward formalism into a new domain with a different causal structure. This paper proposes a novel method to incorporate causal information in the form of Temporal Logic-based Causal Diagrams into the reward formalism, thereby expediting policy learning and aiding the transfer of task specifications to new environments.

Cite

Text

Corazza et al. "Expediting Reinforcement Learning by Incorporating Temporal Causal Information." NeurIPS 2023 Workshops: CRL, 2023.

Markdown

[Corazza et al. "Expediting Reinforcement Learning by Incorporating Temporal Causal Information." NeurIPS 2023 Workshops: CRL, 2023.](https://mlanthology.org/neuripsw/2023/corazza2023neuripsw-expediting/)

BibTeX

@inproceedings{corazza2023neuripsw-expediting,
  title     = {{Expediting Reinforcement Learning by Incorporating Temporal Causal Information}},
  author    = {Corazza, Jan and Aria, Hadi Partovi and Neider, Daniel and Xu, Zhe},
  booktitle = {NeurIPS 2023 Workshops: CRL},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/corazza2023neuripsw-expediting/}
}