HyperNetworks

Abstract

This work explores hypernetworks: an approach of using one network, also known as a hypernetwork, to generate the weights for another network. We apply hypernetworks to generate adaptive weights for recurrent networks. In this case, hypernetworks can be viewed as a relaxed form of weight-sharing across layers. In our implementation, hypernetworks are are trained jointly with the main network in an end-to-end fashion. Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve state-of-the-art results on a variety of sequence modelling tasks including character-level language modelling, handwriting generation and neural machine translation, challenging the weight-sharing paradigm for recurrent networks.

Cite

Text

Ha et al. "HyperNetworks." International Conference on Learning Representations, 2017.

Markdown

[Ha et al. "HyperNetworks." International Conference on Learning Representations, 2017.](https://mlanthology.org/iclr/2017/ha2017iclr-hypernetworks/)

BibTeX

@inproceedings{ha2017iclr-hypernetworks,
  title     = {{HyperNetworks}},
  author    = {Ha, David and Dai, Andrew M. and Le, Quoc V.},
  booktitle = {International Conference on Learning Representations},
  year      = {2017},
  url       = {https://mlanthology.org/iclr/2017/ha2017iclr-hypernetworks/}
}