Inference on High-Dimensional Single-Index Models with Streaming Data
Abstract
Traditional statistical methods are faced with new challenges due to streaming data. The major challenge is the rapidly growing volume and velocity of data, which makes storing such huge data sets in memory impossible. The paper presents an online inference framework for regression parameters in high-dimensional semiparametric single-index models with unknown link functions. The proposed online procedure updates only the current data batch and summary statistics of historical data instead of re-accessing the entire raw data set. At the same time, we do not need to estimate the unknown link function, which is a highly challenging task. In addition, a generalized convex loss function is used in the proposed inference procedure. To illustrate the proposed method, we use the Huber loss function and the negative log-likelihood of the logistic regression model. In this study, the asymptotic normality of the proposed online debiased Lasso estimators and the bounds of the proposed online Lasso estimators are investigated. To evaluate the performance of the proposed method, extensive simulation studies have been conducted. We provide applications to Nasdaq stock prices and financial distress data sets.
Cite
Text
Han et al. "Inference on High-Dimensional Single-Index Models with Streaming Data." Journal of Machine Learning Research, 2024.Markdown
[Han et al. "Inference on High-Dimensional Single-Index Models with Streaming Data." Journal of Machine Learning Research, 2024.](https://mlanthology.org/jmlr/2024/han2024jmlr-inference/)BibTeX
@article{han2024jmlr-inference,
title = {{Inference on High-Dimensional Single-Index Models with Streaming Data}},
author = {Han, Dongxiao and Xie, Jinhan and Liu, Jin and Sun, Liuquan and Huang, Jian and Jiang, Bei and Kong, Linglong},
journal = {Journal of Machine Learning Research},
year = {2024},
pages = {1-68},
volume = {25},
url = {https://mlanthology.org/jmlr/2024/han2024jmlr-inference/}
}