Kernel Sufficient Dimension Reduction and Variable Selection for Compositional Data via Amalgamation
Abstract
Compositional data with a large number of components and an abundance of zeros are frequently observed in many fields recently. Analyzing such sparse high-dimensional compositional data naturally calls for dimension reduction or, more preferably, variable selection. Most existing approaches lack interpretability or cannot handle zeros properly, as they rely on a log-ratio transformation. We approach this problem with sufficient dimension reduction (SDR), one of the most studied dimension reduction frameworks in statistics. Characterized by the conditional independence of the data to the response on the found subspace, the SDR framework has been effective for both linear and nonlinear dimension reduction problems. This work proposes a compositional SDR that can handle zeros naturally while incorporating the nonlinear nature and spurious negative correlations among components rigorously. A critical consideration of sub-composition versus amalgamation for compositional variable selection is discussed. The proposed compositional SDR is shown to be statistically consistent in constructing a sub-simplex consisting of true signal variables. Simulation and real microbiome data are used to demonstrate the performance of the proposed SDR compared to existing state-of-art approaches.
Cite
Text
Park et al. "Kernel Sufficient Dimension Reduction and Variable Selection for Compositional Data via Amalgamation." International Conference on Machine Learning, 2023.Markdown
[Park et al. "Kernel Sufficient Dimension Reduction and Variable Selection for Compositional Data via Amalgamation." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/park2023icml-kernel/)BibTeX
@inproceedings{park2023icml-kernel,
title = {{Kernel Sufficient Dimension Reduction and Variable Selection for Compositional Data via Amalgamation}},
author = {Park, Junyoung and Ahn, Jeongyoun and Park, Cheolwoo},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {27034-27047},
volume = {202},
url = {https://mlanthology.org/icml/2023/park2023icml-kernel/}
}