THOR, Trace-Based Hardware-Driven Layer-Oriented Natural Gradient Descent Computation

Chen, Mengyun; Gao, Kai-Xin; Liu, Xiaolei; Wang, Zidong; Ni, Ningxi; Zhang, Qian; Chen, Lei; Ding, Chao; Huang, Zheng-Hai; Wang, Min; Wang, Shuangling; Yu, Fan; Zhao, Xinyuan; Xu, Dachuan

doi:10.1609/AAAI.V35I8.16867

THOR, Trace-Based Hardware-Driven Layer-Oriented Natural Gradient Descent Computation

Mengyun Chen, Kai-Xin Gao, Xiaolei Liu, Zidong Wang, Ningxi Ni, Qian Zhang, Lei Chen, Chao Ding, Zheng-Hai Huang, Min Wang, Shuangling Wang, Fan Yu, Xinyuan Zhao, Dachuan Xu

AAAI 2021 pp. 7046-7054

doi:10.1609/AAAI.V35I8.16867 /aaai/2021/chen2021aaai-thor/

Abstract

It is well-known that second-order optimizer can accelerate the training of deep neural networks, however, the huge computation cost of second-order optimization makes it impractical to apply in real practice. In order to reduce the cost, many methods have been proposed to approximate a second-order matrix. Inspired by KFAC, we propose a novel Trace-based Hardware-driven layer-ORiented Natural Gradient Descent Computation method, called THOR, to make the second-order optimization applicable in the real application models. Specifically, we gradually increase the update interval and use the matrix trace to determine which blocks of Fisher Information Matrix (FIM) need to be updated. Moreover, by resorting the power of hardware, we have designed a Hardware-driven approximation method for computing FIM to achieve better performance. To demonstrate the effectiveness of THOR, we have conducted extensive experiments. The results show that training ResNet-50 on ImageNet with THOR only takes 66.7 minutes to achieve a top-1 accuracy of 75.9 % under an 8 Ascend 910 environment with MindSpore, a new deep learning computing framework. Moreover, with more computational resources, THOR can only takes 2.7 minutes to 75.9 % with 256 Ascend 910.

PDF AAAI Semantic Scholar

Cite

Text

Chen et al. "THOR, Trace-Based Hardware-Driven Layer-Oriented Natural Gradient Descent Computation." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I8.16867

Markdown

[Chen et al. "THOR, Trace-Based Hardware-Driven Layer-Oriented Natural Gradient Descent Computation." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/chen2021aaai-thor/) doi:10.1609/AAAI.V35I8.16867

BibTeX

@inproceedings{chen2021aaai-thor,
  title     = {{THOR, Trace-Based Hardware-Driven Layer-Oriented Natural Gradient Descent Computation}},
  author    = {Chen, Mengyun and Gao, Kai-Xin and Liu, Xiaolei and Wang, Zidong and Ni, Ningxi and Zhang, Qian and Chen, Lei and Ding, Chao and Huang, Zheng-Hai and Wang, Min and Wang, Shuangling and Yu, Fan and Zhao, Xinyuan and Xu, Dachuan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {7046-7054},
  doi       = {10.1609/AAAI.V35I8.16867},
  url       = {https://mlanthology.org/aaai/2021/chen2021aaai-thor/}
}