Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss

Abstract

In this work, we study statistical learning with dependent data and square loss in a hypothesis class with tail decay in Orlicz space: $\mathscr{F}\subset L_{\Psi_p}$. Our inquiry is motivated by the search for a sharp noise interaction term, or variance proxy, in learning with dependent (e.g. $\beta$-mixing) data. Typical non-asymptotic results exhibit variance proxies that are deflated multiplicatively in the mixing time of the underlying covariates process. We show that whenever the topologies of $L^2$ and $\Psi_p$ are comparable on our hypothesis class $\mathscr{F}$, the empirical risk minimizer achieves a rate that only depends on the complexity of the class and second order statistics in its leading term. We refer to this as a near mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term. Our approach, reliant on mixed tail generic chaining, allows us to obtain sharp, instance-optimal rates. Examples that satisfy our framework include for instance sub-Gaussian linear regression and bounded smoothness classes.

Cite

Text

Ziemann et al. "Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss." International Conference on Machine Learning, 2024.

Markdown

[Ziemann et al. "Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/ziemann2024icml-sharp/)

BibTeX

@inproceedings{ziemann2024icml-sharp,
  title     = {{Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss}},
  author    = {Ziemann, Ingvar and Tu, Stephen and Pappas, George J. and Matni, Nikolai},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {62779-62802},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/ziemann2024icml-sharp/}
}