Off-Policy Differentiable Logic Reinforcement Learning

Abstract

In this paper, we proposed an Off-Policy Differentiable Logic Reinforcement Learning (OPDLRL) framework to inherit the benefits of interpretability and generalization ability in Differentiable Inductive Logic Programming (DILP) and also resolves its weakness of execution efficiency, stability, and scalability. The key contributions include the use of approximate inference to significantly reduce the number of logic rules in the deduction process, an off-policy training method to enable approximate inference, and a distributed and hierarchical training framework. Extensive experiments, specifically playing real-time video games in Rabbids against human players, show that OPDLRL has better or similar performance as other DILP-based methods but far more practical in terms of sample efficiency and execution efficiency, making it applicable to complex and (near) real-time domains.

Cite

Text

Zhang et al. "Off-Policy Differentiable Logic Reinforcement Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86520-7_38

Markdown

[Zhang et al. "Off-Policy Differentiable Logic Reinforcement Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/zhang2021ecmlpkdd-offpolicy/) doi:10.1007/978-3-030-86520-7_38

BibTeX

@inproceedings{zhang2021ecmlpkdd-offpolicy,
  title     = {{Off-Policy Differentiable Logic Reinforcement Learning}},
  author    = {Zhang, Li and Li, Xin and Wang, Mingzhong and Tian, Andong},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2021},
  pages     = {617-632},
  doi       = {10.1007/978-3-030-86520-7_38},
  url       = {https://mlanthology.org/ecmlpkdd/2021/zhang2021ecmlpkdd-offpolicy/}
}