Conic Activation Functions
Abstract
Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set—the positive orthant—we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models—including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models' training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.
Cite
Text
Fu and Cohen. "Conic Activation Functions." NeurIPS 2024 Workshops: UniReps, 2024.Markdown
[Fu and Cohen. "Conic Activation Functions." NeurIPS 2024 Workshops: UniReps, 2024.](https://mlanthology.org/neuripsw/2024/fu2024neuripsw-conic/)BibTeX
@inproceedings{fu2024neuripsw-conic,
title = {{Conic Activation Functions}},
author = {Fu, Changqing and Cohen, Laurent D.},
booktitle = {NeurIPS 2024 Workshops: UniReps},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/fu2024neuripsw-conic/}
}