A KNN-Based Non-Parametric Conditional Independence Test for Mixed Data and Application in Causal Discovery
Abstract
Testing for Conditional Independence (CI) is a fundamental task for causal discovery but is particularly challenging in mixed discrete-continuous data. In this context, inadequate assumptions or discretization of continuous variables reduce the CI test’s statistical power, which yields incorrect learned causal structures. In this work, we present a non-parametric CI test leveraging k-nearest neighbor (kNN) methods that are adaptive to mixed discrete-continuous data. In particular, a kNN-based conditional mutual information estimator serves as the test statistic, and the p-value is calculated using a kNN-based local permutation scheme. We prove the CI test’s statistical validity and power in mixed discrete-continuous data, which yields consistency when used in constraint-based causal discovery. An extensive evaluation of synthetic and real-world data shows that the proposed CI test outperforms state-of-the-art approaches in the accuracy of CI testing and causal discovery, particularly in settings with low sample sizes.
Cite
Text
Huegle et al. "A KNN-Based Non-Parametric Conditional Independence Test for Mixed Data and Application in Causal Discovery." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43412-9_32Markdown
[Huegle et al. "A KNN-Based Non-Parametric Conditional Independence Test for Mixed Data and Application in Causal Discovery." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/huegle2023ecmlpkdd-knnbased/) doi:10.1007/978-3-031-43412-9_32BibTeX
@inproceedings{huegle2023ecmlpkdd-knnbased,
title = {{A KNN-Based Non-Parametric Conditional Independence Test for Mixed Data and Application in Causal Discovery}},
author = {Huegle, Johannes and Hagedorn, Christopher and Schlosser, Rainer},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2023},
pages = {541-558},
doi = {10.1007/978-3-031-43412-9_32},
url = {https://mlanthology.org/ecmlpkdd/2023/huegle2023ecmlpkdd-knnbased/}
}