Systematically Exploring Associations Among Multivariate Data

Abstract

Detecting relationships among multivariate data is often of great importance in the analysis of high-dimensional data sets, and has received growing attention for decades from both academic and industrial fields. In this study, we propose a statistical tool named the neighbor correlation coefficient (nCor), which is based on a new idea that measures the local continuity of the reordered data points to quantify the strength of the global association between variables. With sufficient sample size, the new method is able to capture a wide range of functional relationship, whether it is linear or nonlinear, bivariate or multivariate, main effect or interaction. The score of nCor roughly approximates the coefficient of determination (R) of the data which implies the proportion of variance in one variable that is predictable from one or more other variables. On this basis, three nCor based statistics are also proposed here to further characterize the intra and inter structures of the associations from the aspects of nonlinearity, interaction effect, and variable redundancy. The mechanisms of these measures are proved in theory and demonstrated with numerical analyses. Introduction Identifying relationships among variables is one of the most critical issues in data analysis and interpretation (Altman and Krzywinski 2015) with a wide range of applications in diverse fields from data science to neuroscience. Nowadays, however, a large data set may contain a vast number of variable pairs and combinations that are difficult to be examined manually (Reshef et al. 2011). Association measures can be used to quickly find out the significant associations scattered in thousands or even millions of potential relationships without modelling the relationships explicitly, and thereby provide valuable knowledge and promising pointers for future study. Consider a data sample (x(t), y(t))|1≤t≤N that is observed from an underlying functional relationship expressed as follows. y = f(x) + e = ∑

Cite

Text

Zhang. "Systematically Exploring Associations Among Multivariate Data." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I04.6158

Markdown

[Zhang. "Systematically Exploring Associations Among Multivariate Data." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/zhang2020aaai-systematically/) doi:10.1609/AAAI.V34I04.6158

BibTeX

@inproceedings{zhang2020aaai-systematically,
  title     = {{Systematically Exploring Associations Among Multivariate Data}},
  author    = {Zhang, Lifeng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {6786-6794},
  doi       = {10.1609/AAAI.V34I04.6158},
  url       = {https://mlanthology.org/aaai/2020/zhang2020aaai-systematically/}
}