Separating Skills from Preference: Using Learning to Program by Reward

Abstract

Developers of arti cial agents commonly assume that we can only specify agent behavior via the expensive process of implementing new skills. This paper oers an alternative expressed by the separation hypothesis: that behavioral dierences among individuals can be captured as distinct preferences over the same set of skills. We test this hypothesis in a simulated automotive domain by using reinforcement learning to induce vehicle control policies, given a structured set of driving skills that contains options and a user-supplied reward function. We show that qualitatively distinct reward functions produce agents with qualitatively distinct behavior over the same set of skills. This leads to a new development metaphor that we call `programming by reward'.

Cite

Text

Shapiro and Langley. "Separating Skills from Preference: Using Learning to Program by Reward." International Conference on Machine Learning, 2002.

Markdown

[Shapiro and Langley. "Separating Skills from Preference: Using Learning to Program by Reward." International Conference on Machine Learning, 2002.](https://mlanthology.org/icml/2002/shapiro2002icml-separating/)

BibTeX

@inproceedings{shapiro2002icml-separating,
  title     = {{Separating Skills from Preference: Using Learning to Program by Reward}},
  author    = {Shapiro, Daniel G. and Langley, Pat},
  booktitle = {International Conference on Machine Learning},
  year      = {2002},
  pages     = {570-577},
  url       = {https://mlanthology.org/icml/2002/shapiro2002icml-separating/}
}