Differentiable Dynamic Programming for Structured Prediction and Attention
Abstract
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, many DP algorithms are non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on structured prediction (audio-to-score alignment, NER) and on structured and sparse attention for translation.
Cite
Text
Mensch and Blondel. "Differentiable Dynamic Programming for Structured Prediction and Attention." International Conference on Machine Learning, 2018.Markdown
[Mensch and Blondel. "Differentiable Dynamic Programming for Structured Prediction and Attention." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/mensch2018icml-differentiable/)BibTeX
@inproceedings{mensch2018icml-differentiable,
title = {{Differentiable Dynamic Programming for Structured Prediction and Attention}},
author = {Mensch, Arthur and Blondel, Mathieu},
booktitle = {International Conference on Machine Learning},
year = {2018},
pages = {3462-3471},
volume = {80},
url = {https://mlanthology.org/icml/2018/mensch2018icml-differentiable/}
}