Towards Defining Deception in Structural Causal Games

Abstract

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals. There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a functional definition of deception in structural causal games, grounded in the philosophical literature. We present several examples to establish that our formal definition captures philosophical and commonsense desiderata for deception.

Cite

Text

Ward. "Towards Defining Deception in Structural Causal Games." NeurIPS 2022 Workshops: MLSW, 2022.

Markdown

[Ward. "Towards Defining Deception in Structural Causal Games." NeurIPS 2022 Workshops: MLSW, 2022.](https://mlanthology.org/neuripsw/2022/ward2022neuripsw-defining/)

BibTeX

@inproceedings{ward2022neuripsw-defining,
  title     = {{Towards Defining Deception in Structural Causal Games}},
  author    = {Ward, Francis Rhys},
  booktitle = {NeurIPS 2022 Workshops: MLSW},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/ward2022neuripsw-defining/}
}