Spatio-Causal Patterns of Sample Growth

Abstract

Different statistical samples (e.g., from different locations) offer populations and learning systems observations with distinct statistical properties. Samples under (1) ’Unconfounded’ growth preserve systems’ ability to determine their variables’ effects on outcomes-of-interest (and lead, therefore, to interpretable black-box predictions). Samples under (2) ’Externally-Valid’ growth preserve their ability to make predictions that generalize across out-of-sample variation. The first generates predictions that generalize over sample populations, the second over their common unobserved factors. We illustrate these theoretic patterns in the full American census from 1840 to 1940, and samples ranging from the street-level all the way to the national. This reveals new conditions for the generalizability of samples over space and time, and connections among the Shapley value, counterfactual statistics, and hyperbolic geometry.

Cite

Text

Ribeiro. "Spatio-Causal Patterns of Sample Growth." Journal of Artificial Intelligence Research, 2025. doi:10.1613/JAIR.1.15675

Markdown

[Ribeiro. "Spatio-Causal Patterns of Sample Growth." Journal of Artificial Intelligence Research, 2025.](https://mlanthology.org/jair/2025/ribeiro2025jair-spatiocausal/) doi:10.1613/JAIR.1.15675

BibTeX

@article{ribeiro2025jair-spatiocausal,
  title     = {{Spatio-Causal Patterns of Sample Growth}},
  author    = {Ribeiro, Andre F.},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2025},
  doi       = {10.1613/JAIR.1.15675},
  volume    = {83},
  url       = {https://mlanthology.org/jair/2025/ribeiro2025jair-spatiocausal/}
}