Conic Activation Functions

Abstract

Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set—the positive orthant—we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models—including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models' training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.

Cite

Text

Fu and Cohen. "Conic Activation Functions." NeurIPS 2024 Workshops: UniReps, 2024.

Markdown

[Fu and Cohen. "Conic Activation Functions." NeurIPS 2024 Workshops: UniReps, 2024.](https://mlanthology.org/neuripsw/2024/fu2024neuripsw-conic/)

BibTeX

@inproceedings{fu2024neuripsw-conic,
  title     = {{Conic Activation Functions}},
  author    = {Fu, Changqing and Cohen, Laurent D.},
  booktitle = {NeurIPS 2024 Workshops: UniReps},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/fu2024neuripsw-conic/}
}