The Xyz Algorithm for Fast Interaction Search in High-Dimensional Data

Abstract

When performing regression on a data set with $p$ variables, it is often of interest to go beyond using main linear effects and include interactions as products between individual variables. For small-scale problems, these interactions can be computed explicitly but this leads to a computational complexity of at least $\mathcal{O}(p^2)$ if done naively. This cost can be prohibitive if $p$ is very large. We introduce a new randomised algorithm that is able to discover interactions with high probability and under mild conditions has a runtime that is subquadratic in $p$. We show that strong interactions can be discovered in almost linear time, whilst finding weaker interactions requires $\mathcal{O}(p^\alpha)$ operations for $1R. We demonstrate its efficiency for application to genome-wide association studies, where more than $10^11$ interactions can be screened in under $280$ seconds with a single-core $1.2$ GHz CPU.

Cite

Text

Thanei et al. "The Xyz Algorithm for Fast Interaction Search in High-Dimensional Data." Journal of Machine Learning Research, 2018.

Markdown

[Thanei et al. "The Xyz Algorithm for Fast Interaction Search in High-Dimensional Data." Journal of Machine Learning Research, 2018.](https://mlanthology.org/jmlr/2018/thanei2018jmlr-xyz/)

BibTeX

@article{thanei2018jmlr-xyz,
  title     = {{The Xyz Algorithm for Fast Interaction Search in High-Dimensional Data}},
  author    = {Thanei, Gian-Andrea and Meinshausen, Nicolai and Shah, Rajen D.},
  journal   = {Journal of Machine Learning Research},
  year      = {2018},
  pages     = {1-42},
  volume    = {19},
  url       = {https://mlanthology.org/jmlr/2018/thanei2018jmlr-xyz/}
}