Off-Policy Differentiable Logic Reinforcement Learning
Abstract
In this paper, we proposed an Off-Policy Differentiable Logic Reinforcement Learning (OPDLRL) framework to inherit the benefits of interpretability and generalization ability in Differentiable Inductive Logic Programming (DILP) and also resolves its weakness of execution efficiency, stability, and scalability. The key contributions include the use of approximate inference to significantly reduce the number of logic rules in the deduction process, an off-policy training method to enable approximate inference, and a distributed and hierarchical training framework. Extensive experiments, specifically playing real-time video games in Rabbids against human players, show that OPDLRL has better or similar performance as other DILP-based methods but far more practical in terms of sample efficiency and execution efficiency, making it applicable to complex and (near) real-time domains.
Cite
Text
Zhang et al. "Off-Policy Differentiable Logic Reinforcement Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86520-7_38Markdown
[Zhang et al. "Off-Policy Differentiable Logic Reinforcement Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/zhang2021ecmlpkdd-offpolicy/) doi:10.1007/978-3-030-86520-7_38BibTeX
@inproceedings{zhang2021ecmlpkdd-offpolicy,
title = {{Off-Policy Differentiable Logic Reinforcement Learning}},
author = {Zhang, Li and Li, Xin and Wang, Mingzhong and Tian, Andong},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2021},
pages = {617-632},
doi = {10.1007/978-3-030-86520-7_38},
url = {https://mlanthology.org/ecmlpkdd/2021/zhang2021ecmlpkdd-offpolicy/}
}