The Importance of Temperature in Multi-Task Optimization
Abstract
The promise of multi-task learning is that optimizing a single model on multiple related tasks will lead to a better solution for all tasks than independently trained models. In practice, optimization difficulties, such as conflicting gradients, can result in negative transfer, where multi-task models which perform worse than single-task models. In this work, we identify the optimization temperature---the ratio of learning rate to batch size---as a key factor in negative transfer. Temperature controls the level of noise in each optimization step, which prior work has shown to have a strong correlation with generalization. We demonstrate that, in some multi-task settings, negative transfer may arise due to poorly set optimization temperature, rather than inherently high task conflict. The implication of this finding is that in some settings, SGD with a carefully controlled temperature achieves comparable, and in some cases superior, performance to that of specialized optimization procedures such as PCGrad, MGDA, and GradNorm. In particular, our results suggest that the significant additional computational burden of these specialized methods may not always be necessary. Finally, we observe a conflict between the optimal temperatures of different tasks in a multi-task objective, with different levels of noise promoting better generalization for different tasks. Our work suggests the need for novel multi-task optimization methods which consider individual task noise-levels, and their impact on generalization.
Cite
Text
Mueller et al. "The Importance of Temperature in Multi-Task Optimization." NeurIPS 2022 Workshops: OPT, 2022.Markdown
[Mueller et al. "The Importance of Temperature in Multi-Task Optimization." NeurIPS 2022 Workshops: OPT, 2022.](https://mlanthology.org/neuripsw/2022/mueller2022neuripsw-importance/)BibTeX
@inproceedings{mueller2022neuripsw-importance,
title = {{The Importance of Temperature in Multi-Task Optimization}},
author = {Mueller, David and Dredze, Mark and Andrews, Nicholas},
booktitle = {NeurIPS 2022 Workshops: OPT},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/mueller2022neuripsw-importance/}
}