Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages
Abstract
Many algorithms have been proposed to learn local graphical structures around target variables of interest from observational data. The Markov boundary (MB) provides a complete picture of the local causal structure around a variable and is a theoretically optimal solution for the feature selection problem. Available algorithms for MB discovery have focused on various challenges such as scalability and data-efficiency. However, current approaches do not provide guarantees in terms of false discoveries in the MB. In this paper we introduce a novel algorithm for the MB discovery problem with rigorous guarantees on the Family-Wise Error Rate (FWER), that is, the probability of reporting any false positive. Our algorithm uses Rademacher averages, a key concept from statistical learning theory, to properly account for the multiple-hypothesis testing problem arising in MB discovery. Our evaluation on simulated data shows that our algorithm properly controls for the FWER, while widely used algorithms do not provide guarantees on false discoveries even when correcting for multiple-hypothesis testing. Our experiments also show that our algorithm identifies meaningful relations in real-world data.
Cite
Text
Simionato and Vandin. "Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26419-1_16Markdown
[Simionato and Vandin. "Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/simionato2022ecmlpkdd-bounding/) doi:10.1007/978-3-031-26419-1_16BibTeX
@inproceedings{simionato2022ecmlpkdd-bounding,
title = {{Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages}},
author = {Simionato, Dario and Vandin, Fabio},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2022},
pages = {255-271},
doi = {10.1007/978-3-031-26419-1_16},
url = {https://mlanthology.org/ecmlpkdd/2022/simionato2022ecmlpkdd-bounding/}
}