Provably Correct Automatic Sub-Differentiation for Qualified Programs
Abstract
The \emph{Cheap Gradient Principle}~\citep{Griewank:2008:EDP:1455489} --- the computational cost of computing a $d$-dimensional vector of partial derivatives of a scalar function is nearly the same (often within a factor of $5$) as that of simply computing the scalar function itself --- is of central importance in optimization; it allows us to quickly obtain (high-dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. The current state of affairs is markedly different with regards to computing sub-derivatives: widely used ML libraries, including TensorFlow and PyTorch, do \emph{not} correctly compute (generalized) sub-derivatives even on simple differentiable examples. This work considers the question: is there a \emph{Cheap Sub-gradient Principle}? Our main result shows that, under certain restrictions on our library of non-smooth functions (standard in non-linear programming), provably correct generalized sub-derivatives can be computed at a computational cost that is within a (dimension-free) factor of $6$ of the cost of computing the scalar function itself.
Cite
Text
Kakade and Lee. "Provably Correct Automatic Sub-Differentiation for Qualified Programs." Neural Information Processing Systems, 2018.Markdown
[Kakade and Lee. "Provably Correct Automatic Sub-Differentiation for Qualified Programs." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/kakade2018neurips-provably/)BibTeX
@inproceedings{kakade2018neurips-provably,
title = {{Provably Correct Automatic Sub-Differentiation for Qualified Programs}},
author = {Kakade, Sham M. and Lee, Jason},
booktitle = {Neural Information Processing Systems},
year = {2018},
pages = {7125-7135},
url = {https://mlanthology.org/neurips/2018/kakade2018neurips-provably/}
}