Can Go AIs Be Adversarially Robust?
Abstract
Prior work found that superhuman Go AIs like KataGo can be defeated by simple adversarial strategies. In this paper, we study if defenses can improve KataGo's worst-case performance. We test three natural defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that though some of these defenses protect against previously discovered attacks, none withstand adaptive attacks. In particular, we are able to train new adversaries that reliably defeat our defended agents by causing them to blunder in ways humans would not. Our results suggest that building robust AI systems is challenging even for superhuman systems in narrow domains like Go.
Cite
Text
Tseng et al. "Can Go AIs Be Adversarially Robust?." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I26.34980Markdown
[Tseng et al. "Can Go AIs Be Adversarially Robust?." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/tseng2025aaai-go/) doi:10.1609/AAAI.V39I26.34980BibTeX
@inproceedings{tseng2025aaai-go,
title = {{Can Go AIs Be Adversarially Robust?}},
author = {Tseng, Tom and McLean, Euan and Pelrine, Kellin and Wang, Tony Tong and Gleave, Adam},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {27662-27670},
doi = {10.1609/AAAI.V39I26.34980},
url = {https://mlanthology.org/aaai/2025/tseng2025aaai-go/}
}