Off-Policy Differentiable Logic Reinforcement Learning

Zhang, Li; Li, Xin; Wang, Mingzhong; Tian, Andong

doi:10.1007/978-3-030-86520-7_38

Off-Policy Differentiable Logic Reinforcement Learning

Li Zhang, Xin Li, Mingzhong Wang, Andong Tian

ECML-PKDD 2021 pp. 617-632

doi:10.1007/978-3-030-86520-7_38 /ecmlpkdd/2021/zhang2021ecmlpkdd-offpolicy/

Abstract

In this paper, we proposed an Off-Policy Differentiable Logic Reinforcement Learning (OPDLRL) framework to inherit the benefits of interpretability and generalization ability in Differentiable Inductive Logic Programming (DILP) and also resolves its weakness of execution efficiency, stability, and scalability. The key contributions include the use of approximate inference to significantly reduce the number of logic rules in the deduction process, an off-policy training method to enable approximate inference, and a distributed and hierarchical training framework. Extensive experiments, specifically playing real-time video games in Rabbids against human players, show that OPDLRL has better or similar performance as other DILP-based methods but far more practical in terms of sample efficiency and execution efficiency, making it applicable to complex and (near) real-time domains.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Zhang et al. "Off-Policy Differentiable Logic Reinforcement Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86520-7_38

Markdown

[Zhang et al. "Off-Policy Differentiable Logic Reinforcement Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/zhang2021ecmlpkdd-offpolicy/) doi:10.1007/978-3-030-86520-7_38

BibTeX

@inproceedings{zhang2021ecmlpkdd-offpolicy,
  title     = {{Off-Policy Differentiable Logic Reinforcement Learning}},
  author    = {Zhang, Li and Li, Xin and Wang, Mingzhong and Tian, Andong},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2021},
  pages     = {617-632},
  doi       = {10.1007/978-3-030-86520-7_38},
  url       = {https://mlanthology.org/ecmlpkdd/2021/zhang2021ecmlpkdd-offpolicy/}
}