Beyond Gradient Descent for Regularized Segmentation Losses
Abstract
The simplicity of gradient descent (GD) made it the default method for training ever-deeper and complex neural networks. Both loss functions and architectures are often explicitly tuned to be amenable to this basic local optimization. In the context of weakly-supervised CNN segmentation, we demonstrate a well-motivated loss function where an alternative optimizer (ADM) achieves the state-of-the-art while GD performs poorly. Interestingly, GD obtains its best result for a "smoother" tuning of the loss function. The results are consistent across different network architectures. Our loss is motivated by well-understood MRF/CRF regularization models in "shallow" segmentation and their known global solvers. Our work suggests that network design/training should pay more attention to optimization methods.
Cite
Text
Marin et al. "Beyond Gradient Descent for Regularized Segmentation Losses." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.01043Markdown
[Marin et al. "Beyond Gradient Descent for Regularized Segmentation Losses." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/marin2019cvpr-beyond/) doi:10.1109/CVPR.2019.01043BibTeX
@inproceedings{marin2019cvpr-beyond,
title = {{Beyond Gradient Descent for Regularized Segmentation Losses}},
author = {Marin, Dmitrii and Tang, Meng and Ayed, Ismail Ben and Boykov, Yuri},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2019},
doi = {10.1109/CVPR.2019.01043},
url = {https://mlanthology.org/cvpr/2019/marin2019cvpr-beyond/}
}