Dimension-Independent Rates for Structured Neural Density Estimation

Abstract

We show that deep neural networks can achieve dimension-independent rates of convergence for learning structured densities typical of image, audio, video, and text data. For example, in images, where each pixel becomes independent of the rest of the image when conditioned on pixels at most $t$ steps away, a simple $L^2$-minimizing neural network can attain a rate of $n^{-1/((t+1)^2+4)}$, where $t$ is independent of the ambient dimension $d$, i.e. the total number of pixels. We further provide empirical evidence that, in real-world applications, $t$ is often a small constant, thus effectively circumventing the curse of dimensionality. Moreover, for sequential data (e.g., audio or text) exhibiting a similar local dependence structure, our analysis shows a rate of $n^{-1/(t+5)}$, offering further evidence of dimension independence in practical scenarios.

Cite

Text

Vandermeulen et al. "Dimension-Independent Rates for Structured Neural Density Estimation." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Vandermeulen et al. "Dimension-Independent Rates for Structured Neural Density Estimation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/vandermeulen2025icml-dimensionindependent/)

BibTeX

@inproceedings{vandermeulen2025icml-dimensionindependent,
  title     = {{Dimension-Independent Rates for Structured Neural Density Estimation}},
  author    = {Vandermeulen, Robert A. and Tai, Wai Ming and Aragam, Bryon},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {60857-60879},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/vandermeulen2025icml-dimensionindependent/}
}