Policy Shaping with Human Teachers

Cederborg, Thomas; Grover, Ishaan; Jr., Charles L. Isbell; Thomaz, Andrea Lockerd

Policy Shaping with Human Teachers

Thomas Cederborg, Ishaan Grover, Charles L. Isbell Jr., Andrea Lockerd Thomaz

IJCAI 2015 pp. 3366-3372

/ijcai/2015/cederborg2015ijcai-policy/

Abstract

In this work we evaluate the performance of a policy shaping algorithm using 26 human teachers. We examine if the algorithm is suitable for human-generated data on two different boards in a pac-man domain, comparing performance to an oracle that provides critique based on one known winning policy. Perhaps surprisingly, we show that the data generated by our 26 participants yields even better performance for the agent than data generated by the oracle. This might be because humans do not discourage exploring multiple winning policies. Additionally, we evaluate the impact of different verbal instructions, and different interpretations of silence, finding that the usefulness of data is affected both by what instructions is given to teachers, and how the data is interpreted.

PDF IJCAI Semantic Scholar

Cite

Text

Cederborg et al. "Policy Shaping with Human Teachers." International Joint Conference on Artificial Intelligence, 2015.

Markdown

[Cederborg et al. "Policy Shaping with Human Teachers." International Joint Conference on Artificial Intelligence, 2015.](https://mlanthology.org/ijcai/2015/cederborg2015ijcai-policy/)

BibTeX

@inproceedings{cederborg2015ijcai-policy,
  title     = {{Policy Shaping with Human Teachers}},
  author    = {Cederborg, Thomas and Grover, Ishaan and Jr., Charles L. Isbell and Thomaz, Andrea Lockerd},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2015},
  pages     = {3366-3372},
  url       = {https://mlanthology.org/ijcai/2015/cederborg2015ijcai-policy/}
}