Binarsity: A Penalization for One-Hot Encoded Features in Linear Supervised Learning

Abstract

This paper deals with the problem of large-scale linear supervised learning in settings where a large number of continuous features are available. We propose to combine the well-known trick of one-hot encoding of continuous features with a new penalization called binarsity. In each group of binary features coming from the one-hot encoding of a single raw continuous feature, this penalization uses total-variation regularization together with an extra linear constraint. This induces two interesting properties on the model weights of the one-hot encoded features: they are piecewise constant, and are eventually block sparse. Non-asymptotic oracle inequalities for generalized linear models are proposed. Moreover, under a sparse additive model assumption, we prove that our procedure matches the state-of-the-art in this setting. Numerical experiments illustrate the good performances of our approach on several datasets. It is also noteworthy that our method has a numerical complexity comparable to standard $\ell_1$ penalization.

Cite

Text

Alaya et al. "Binarsity: A Penalization for One-Hot Encoded Features in Linear Supervised Learning." Journal of Machine Learning Research, 2019.

Markdown

[Alaya et al. "Binarsity: A Penalization for One-Hot Encoded Features in Linear Supervised Learning." Journal of Machine Learning Research, 2019.](https://mlanthology.org/jmlr/2019/alaya2019jmlr-binarsity/)

BibTeX

@article{alaya2019jmlr-binarsity,
  title     = {{Binarsity: A Penalization for One-Hot Encoded Features in Linear Supervised Learning}},
  author    = {Alaya, Mokhtar Z. and Bussy, Simon and Gaïffas, Stéphane and Guilloux, Agathe},
  journal   = {Journal of Machine Learning Research},
  year      = {2019},
  pages     = {1-34},
  volume    = {20},
  url       = {https://mlanthology.org/jmlr/2019/alaya2019jmlr-binarsity/}
}