Improving Continual Learning by Accurate Gradient Reconstructions of the past

Abstract

Weight-regularization and experience replay are two popular continual-learning strategies with complementary strengths: while weight-regularization requires less memory, replay can more accurately mimic batch training. How can we combine them to get better methods? Despite the simplicity of the question, little is known or done to optimally combine these approaches. In this paper, we present such a method by using a recently proposed principle of adaptation that relies on a faithful reconstruction of the gradients of the past data. Using this principle, we design a prior which combines two types of replay methods with a quadratic weight-regularizer and achieves better gradient reconstructions. The combination improves performance on standard task-incremental continual learning benchmarks such as Split-CIFAR, SplitTinyImageNet, and ImageNet-1000, achieving $>\!80\%$ of the batch performance by simply utilizing a memory of $<\!10\%$ of the past data. Our work shows that a good combination of the two strategies can be very effective in reducing forgetting.

Cite

Text

Daxberger et al. "Improving Continual Learning by Accurate Gradient Reconstructions of the past." Transactions on Machine Learning Research, 2023.

Markdown

[Daxberger et al. "Improving Continual Learning by Accurate Gradient Reconstructions of the past." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/daxberger2023tmlr-improving/)

BibTeX

@article{daxberger2023tmlr-improving,
  title     = {{Improving Continual Learning by Accurate Gradient Reconstructions of the past}},
  author    = {Daxberger, Erik and Swaroop, Siddharth and Osawa, Kazuki and Yokota, Rio and Turner, Richard E and Hernández-Lobato, José Miguel and Khan, Mohammad Emtiyaz},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/daxberger2023tmlr-improving/}
}