On the Approximation Rate of Hierarchical Mixtures-of-Experts for Generalized Linear Models

Abstract

We investigate a class of hierarchical mixtures-of-experts (HME) models where generalized linear models with nonlinear mean functions of the form ψ(α + xTβ) are mixed. Here ψ(·) is the inverse link function. It is shown that mixtures of such mean functions can approximate a class of smooth functions of the form ψ(h(x)), where h(·) ε W∞2;k (a Sobolev class over [0, 1]s, as the number of experts m in the network increases. An upper bound of the approximation rate is given as O(m−2/s) in Lp norm. This rate can be achieved within the family of HME structures with no more than s-layers, where s is the dimension of the predictor x.

Cite

Text

Jiang and Tanner. "On the Approximation Rate of Hierarchical Mixtures-of-Experts for Generalized Linear Models." Neural Computation, 1999. doi:10.1162/089976699300016403

Markdown

[Jiang and Tanner. "On the Approximation Rate of Hierarchical Mixtures-of-Experts for Generalized Linear Models." Neural Computation, 1999.](https://mlanthology.org/neco/1999/jiang1999neco-approximation/) doi:10.1162/089976699300016403

BibTeX

@article{jiang1999neco-approximation,
  title     = {{On the Approximation Rate of Hierarchical Mixtures-of-Experts for Generalized Linear Models}},
  author    = {Jiang, Wenxin and Tanner, Martin A.},
  journal   = {Neural Computation},
  year      = {1999},
  pages     = {1183-1198},
  doi       = {10.1162/089976699300016403},
  volume    = {11},
  url       = {https://mlanthology.org/neco/1999/jiang1999neco-approximation/}
}