Adversarial Policies Beat Professional-Level Go AIs

Abstract

We attack the state-of-the-art Go-playing AI system, KataGo, by training an adversarial policy that plays against a frozen KataGo victim. Our attack achieves a >99% win-rate against KataGo without search, and a >80% win-rate when KataGo uses enough search to be near-superhuman. To the best of our knowledge, this is the first successful end-to-end attack against a Go AI playing at the level of a top human professional. Notably, the adversary does not win by learning to play Go better than KataGo---in fact, the adversary is easily beaten by human amateurs. Instead, the adversary wins by tricking KataGo into ending the game prematurely at a point that is favorable to the adversary. Our results demonstrate that even professional-level AI systems may harbor surprising failure modes.

Cite

Text

Wang et al. "Adversarial Policies Beat Professional-Level Go AIs." NeurIPS 2022 Workshops: DeepRL, 2022.

Markdown

[Wang et al. "Adversarial Policies Beat Professional-Level Go AIs." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/wang2022neuripsw-adversarial/)

BibTeX

@inproceedings{wang2022neuripsw-adversarial,
  title     = {{Adversarial Policies Beat Professional-Level Go AIs}},
  author    = {Wang, Tony Tong and Gleave, Adam and Belrose, Nora and Tseng, Tom and Dennis, Michael D and Duan, Yawen and Pogrebniak, Viktor and Miller, Joseph and Levine, Sergey and Russell, Stuart},
  booktitle = {NeurIPS 2022 Workshops: DeepRL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/wang2022neuripsw-adversarial/}
}