Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss
Abstract
In this work, we study statistical learning with dependent data and square loss in a hypothesis class with tail decay in Orlicz space: $\mathscr{F}\subset L_{\Psi_p}$. Our inquiry is motivated by the search for a sharp noise interaction term, or variance proxy, in learning with dependent (e.g. $\beta$-mixing) data. Typical non-asymptotic results exhibit variance proxies that are deflated multiplicatively in the mixing time of the underlying covariates process. We show that whenever the topologies of $L^2$ and $\Psi_p$ are comparable on our hypothesis class $\mathscr{F}$, the empirical risk minimizer achieves a rate that only depends on the complexity of the class and second order statistics in its leading term. We refer to this as a near mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term. Our approach, reliant on mixed tail generic chaining, allows us to obtain sharp, instance-optimal rates. Examples that satisfy our framework include for instance sub-Gaussian linear regression and bounded smoothness classes.
Cite
Text
Ziemann et al. "Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss." International Conference on Machine Learning, 2024.Markdown
[Ziemann et al. "Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/ziemann2024icml-sharp/)BibTeX
@inproceedings{ziemann2024icml-sharp,
title = {{Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss}},
author = {Ziemann, Ingvar and Tu, Stephen and Pappas, George J. and Matni, Nikolai},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {62779-62802},
volume = {235},
url = {https://mlanthology.org/icml/2024/ziemann2024icml-sharp/}
}