Extending Policy Shaping to Continuous State Spaces (Student Abstract)

Abstract

Policy Shaping is a Human-in-the-loop Reinforcement Learning (HRL) algorithm. We extend this work to continuous states with our algorithm, Deep Policy Shaping (DPS). DPS uses a feedback neural network that learns the optimality of actions from noisy feedback combined with an RL algorithm. In simulation, we find that DPS outperforms or matches baselines averaged over multiple hyperparameter settings and varying feedback correctness.

Cite

Text

Wei et al. "Extending Policy Shaping to Continuous State Spaces (Student Abstract)." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I18.17956

Markdown

[Wei et al. "Extending Policy Shaping to Continuous State Spaces (Student Abstract)." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/wei2021aaai-extending/) doi:10.1609/AAAI.V35I18.17956

BibTeX

@inproceedings{wei2021aaai-extending,
  title     = {{Extending Policy Shaping to Continuous State Spaces (Student Abstract)}},
  author    = {Wei, Thomas Benjamin and Faulkner, Taylor A. Kessler and Thomaz, Andrea Lockerd},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {15919-15920},
  doi       = {10.1609/AAAI.V35I18.17956},
  url       = {https://mlanthology.org/aaai/2021/wei2021aaai-extending/}
}