Spatio-Causal Patterns of Sample Growth
Abstract
Different statistical samples (e.g., from different locations) offer populations and learning systems observations with distinct statistical properties. Samples under (1) ’Unconfounded’ growth preserve systems’ ability to determine their variables’ effects on outcomes-of-interest (and lead, therefore, to interpretable black-box predictions). Samples under (2) ’Externally-Valid’ growth preserve their ability to make predictions that generalize across out-of-sample variation. The first generates predictions that generalize over sample populations, the second over their common unobserved factors. We illustrate these theoretic patterns in the full American census from 1840 to 1940, and samples ranging from the street-level all the way to the national. This reveals new conditions for the generalizability of samples over space and time, and connections among the Shapley value, counterfactual statistics, and hyperbolic geometry.
Cite
Text
Ribeiro. "Spatio-Causal Patterns of Sample Growth." Journal of Artificial Intelligence Research, 2025. doi:10.1613/JAIR.1.15675Markdown
[Ribeiro. "Spatio-Causal Patterns of Sample Growth." Journal of Artificial Intelligence Research, 2025.](https://mlanthology.org/jair/2025/ribeiro2025jair-spatiocausal/) doi:10.1613/JAIR.1.15675BibTeX
@article{ribeiro2025jair-spatiocausal,
title = {{Spatio-Causal Patterns of Sample Growth}},
author = {Ribeiro, Andre F.},
journal = {Journal of Artificial Intelligence Research},
year = {2025},
doi = {10.1613/JAIR.1.15675},
volume = {83},
url = {https://mlanthology.org/jair/2025/ribeiro2025jair-spatiocausal/}
}