Statistical Mechanics of the Mixture of Experts

Abstract

We study generalization capability of the mixture of experts learn(cid:173) ing from examples generated by another network with the same architecture. When the number of examples is smaller than a crit(cid:173) ical value, the network shows a symmetric phase where the role of the experts is not specialized. Upon crossing the critical point, the system undergoes a continuous phase transition to a symme(cid:173) try breaking phase where the gating network partitions the input space effectively and each expert is assigned to an appropriate sub(cid:173) space. We also find that the mixture of experts with multiple level of hierarchy shows multiple phase transitions.

Cite

Text

Kang and Oh. "Statistical Mechanics of the Mixture of Experts." Neural Information Processing Systems, 1996.

Markdown

[Kang and Oh. "Statistical Mechanics of the Mixture of Experts." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/kang1996neurips-statistical/)

BibTeX

@inproceedings{kang1996neurips-statistical,
  title     = {{Statistical Mechanics of the Mixture of Experts}},
  author    = {Kang, Kukjin and Oh, Jong-Hoon},
  booktitle = {Neural Information Processing Systems},
  year      = {1996},
  pages     = {183-189},
  url       = {https://mlanthology.org/neurips/1996/kang1996neurips-statistical/}
}