Directed Graphical Models and Causal Discovery for Zero-Inflated Data

Abstract

With advances in technology, gene expression measurements from single cells can be used to gain refined insights into regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often show zero-inflated expression patterns. To address this challenge, we propose directed graphical models that are based on Hurdle conditional distributions parametrized in terms of polynomials in parent variables and their $0/1$ indicators of being zero or nonzero. While directed graphs for Gaussian models are only identifiable up to an equivalence class in general, we show that, under a natural and weak assumption, the exact directed acyclic graph of our zero-inflated models can be identified. We propose methods for graph recovery, apply our model to real single-cell gene expression data on T helper cells, and show simulated experiments that validate the identifiability and graph estimation methods in practice.

Cite

Text

Yu et al. "Directed Graphical Models and Causal Discovery for Zero-Inflated Data." Proceedings of the Second Conference on Causal Learning and Reasoning, 2023.

Markdown

[Yu et al. "Directed Graphical Models and Causal Discovery for Zero-Inflated Data." Proceedings of the Second Conference on Causal Learning and Reasoning, 2023.](https://mlanthology.org/clear/2023/yu2023clear-directed/)

BibTeX

@inproceedings{yu2023clear-directed,
  title     = {{Directed Graphical Models and Causal Discovery for Zero-Inflated Data}},
  author    = {Yu, Shiqing and Drton, Mathias and Shojaie, Ali},
  booktitle = {Proceedings of the Second Conference on Causal Learning and Reasoning},
  year      = {2023},
  pages     = {27-67},
  volume    = {213},
  url       = {https://mlanthology.org/clear/2023/yu2023clear-directed/}
}