Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation

Abstract

Online services rely on CAPTCHAs as a first line of defense against automated abuse, yet recent advances in multi-modal large language models (MLLMs) have eroded the effectiveness of conventional designs that focus on text recognition or 2D image understanding. To address this challenge, we present **Spatial CAPTCHA**, a novel human-verification framework that leverages fundamental differences in spatial reasoning between humans and MLLMs. Unlike existing CAPTCHAs that rely on low-level perception tasks vulnerable to modern AI, Spatial CAPTCHA generates dynamic questions requiring geometric reasoning, perspective-taking, occlusion handling, and mental rotation—skills intuitive for humans but difficult for current AI systems. The system employs a procedural generation pipeline with constraint-based difficulty control, automated correctness verification, and human-in-the-loop validation to ensure scalability, robustness, and adaptability. Evaluation on a corresponding benchmark, **Spatial-CAPTCHA-Bench**, demonstrates that humans vastly outperform 10 state-of-the-art MLLMs, with the best model achieving only 31.0\% Pass@1 accuracy. Result comparison with Google reCAPTCHA further confirms the effectiveness of Spatial CAPTCHA as both a security mechanism and a diagnostic tool for spatial reasoning in AI.

Cite

Text

Kharlamova et al. "Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation." International Conference on Learning Representations, 2026.

Markdown

[Kharlamova et al. "Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/kharlamova2026iclr-spatial/)

BibTeX

@inproceedings{kharlamova2026iclr-spatial,
  title     = {{Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation}},
  author    = {Kharlamova, Arina and He, Bowei and Ma, Chen and Liu, Xue},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/kharlamova2026iclr-spatial/}
}