Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data

Abstract

We present a new non-negative matrix factorization model for $(0,1)$ bounded-support data based on the doubly non-central beta (DNCB) distribution, a generalization of the beta distribution. The expressiveness of the DNCB distribution is particularly useful for modeling DNA methylation datasets, which are typically highly dispersed and multi-modal; however, the model structure is sufficiently general that it can be adapted to many other domains where latent representations of $(0,1)$ bounded-support data are of interest. Although the DNCB distribution lacks a closed-form conjugate prior, several augmentations let us derive an efficient posterior inference algorithm composed entirely of analytic updates. Our model improves out-of-sample predictive performance on both real and synthetic DNA methylation datasets over state-of-the-art methods in bioinformatics. In addition, our model yields meaningful latent representations that accord with existing biological knowledge.

Cite

Text

Schein et al. "Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data." Uncertainty in Artificial Intelligence, 2021.

Markdown

[Schein et al. "Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data." Uncertainty in Artificial Intelligence, 2021.](https://mlanthology.org/uai/2021/schein2021uai-doubly/)

BibTeX

@inproceedings{schein2021uai-doubly,
  title     = {{Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data}},
  author    = {Schein, Aaron and Nagulpally, Anjali and Wallach, Hanna and Flaherty, Patrick},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2021},
  pages     = {1895-1904},
  volume    = {161},
  url       = {https://mlanthology.org/uai/2021/schein2021uai-doubly/}
}