Sparse Contextual CDF Regression

Abstract

Estimating cumulative distribution functions (CDFs) of context-dependent random variables is a central statistical task underpinning numerous applications in machine learning and economics. In this work, we extend a recent line of theoretical inquiry into this domain by analyzing the problem of \emph{sparse contextual CDF regression}, wherein data points are sampled from a convex combination of $s$ context-dependent CDFs chosen from a set of $d$ basis functions. We show that adaptations of several canonical regression methods serve as tractable estimators in this functional sparse regression setting under standard assumptions on the conditioning of the basis functions. In particular, given $n$ data samples, we prove estimation error upper bounds of $\tilde{O}(\sqrt{s/n})$ for functional versions of the lasso and Dantzig selector estimators, and $\tilde{O}(\sqrt{s}/\sqrt[4]{n})$ for a functional version of the elastic net estimator. Our results match the corresponding error bounds for finite-dimensional regression and improve upon CDF ridge regression which has $\tilde{O}(\sqrt{d/n})$ estimation error. Finally, we obtain a matching information-theoretic lower bound which establishes the minimax optimality of the lasso and Dantzig selector estimators up to logarithmic factors.

Cite

Text

Azizzadenesheli et al. "Sparse Contextual CDF Regression." Transactions on Machine Learning Research, 2024.

Markdown

[Azizzadenesheli et al. "Sparse Contextual CDF Regression." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/azizzadenesheli2024tmlr-sparse/)

BibTeX

@article{azizzadenesheli2024tmlr-sparse,
  title     = {{Sparse Contextual CDF Regression}},
  author    = {Azizzadenesheli, Kamyar and Lu, William and Makur, Anuran and Zhang, Qian},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/azizzadenesheli2024tmlr-sparse/}
}