Provably Correct Automatic Sub-Differentiation for Qualified Programs

Abstract

The \emph{Cheap Gradient Principle}~\citep{Griewank:2008:EDP:1455489} --- the computational cost of computing a $d$-dimensional vector of partial derivatives of a scalar function is nearly the same (often within a factor of $5$) as that of simply computing the scalar function itself --- is of central importance in optimization; it allows us to quickly obtain (high-dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. The current state of affairs is markedly different with regards to computing sub-derivatives: widely used ML libraries, including TensorFlow and PyTorch, do \emph{not} correctly compute (generalized) sub-derivatives even on simple differentiable examples. This work considers the question: is there a \emph{Cheap Sub-gradient Principle}? Our main result shows that, under certain restrictions on our library of non-smooth functions (standard in non-linear programming), provably correct generalized sub-derivatives can be computed at a computational cost that is within a (dimension-free) factor of $6$ of the cost of computing the scalar function itself.

Cite

Text

Kakade and Lee. "Provably Correct Automatic Sub-Differentiation for Qualified Programs." Neural Information Processing Systems, 2018.

Markdown

[Kakade and Lee. "Provably Correct Automatic Sub-Differentiation for Qualified Programs." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/kakade2018neurips-provably/)

BibTeX

@inproceedings{kakade2018neurips-provably,
  title     = {{Provably Correct Automatic Sub-Differentiation for Qualified Programs}},
  author    = {Kakade, Sham M. and Lee, Jason},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {7125-7135},
  url       = {https://mlanthology.org/neurips/2018/kakade2018neurips-provably/}
}