"You Just Can’t Go Around Killing People'' Explaining Agent Behavior to a Human Terminator

Abstract

Consider a setting where a pre-trained agent is operating in an environment and a human operator can decide to temporarily terminate its operation and take-over for some duration of time. These kind of scenarios are common in human-machine interactions, for example in autonomous driving, factory automation and healthcare. In these settings, we typically observe a trade-off between two extreme cases -- if no take-overs are allowed, then the agent might employ a sub-optimal, possibly dangerous policy. Alternatively, if there are too many take-overs, then the human has no confidence in the agent, greatly limiting its usefulness. In this paper, we formalize this setup and propose an explainability scheme to help optimize the number of human interventions.

Cite

Text

Menkes et al. ""You Just Can’t Go Around Killing People'' Explaining Agent Behavior to a Human Terminator." ICML 2024 Workshops: MFHAIA, 2024.

Markdown

[Menkes et al. ""You Just Can’t Go Around Killing People'' Explaining Agent Behavior to a Human Terminator." ICML 2024 Workshops: MFHAIA, 2024.](https://mlanthology.org/icmlw/2024/menkes2024icmlw-you/)

BibTeX

@inproceedings{menkes2024icmlw-you,
  title     = {{"You Just Can’t Go Around Killing People'' Explaining Agent Behavior to a Human Terminator}},
  author    = {Menkes, Uri and Amir, Ofra and Hallak, Assaf},
  booktitle = {ICML 2024 Workshops: MFHAIA},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/menkes2024icmlw-you/}
}