Gradient Descent Fails to Learn High-Frequency Functions and Modular Arithmetic

Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhenisbek Assylbekov

MLJ 2025 pp. 117

doi:10.1007/S10994-025-06747-8 /mlj/2025/takhanov2025mlj-gradient/

Abstract

Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function. A set of functions of the form x→axmodp\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$x\rightarrow ax \bmod p$\end{document}, where a is taken from Zp\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}${{\mathbb {Z}}}_p$\end{document}, has attracted some attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of p-periodic functions on Z\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}${{\mathbb {Z}}}$\end{document} and is tightly connected with a class of high-frequency periodic functions on the real line. We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base p is large. This in turn prevents such a learning algorithm from being successful.

PDF MLJ Semantic Scholar

Cite

Text

Takhanov et al. "Gradient Descent Fails to Learn High-Frequency Functions and Modular Arithmetic." Machine Learning, 2025. doi:10.1007/S10994-025-06747-8

Markdown

[Takhanov et al. "Gradient Descent Fails to Learn High-Frequency Functions and Modular Arithmetic." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/takhanov2025mlj-gradient/) doi:10.1007/S10994-025-06747-8

BibTeX

@article{takhanov2025mlj-gradient,
  title     = {{Gradient Descent Fails to Learn High-Frequency Functions and Modular Arithmetic}},
  author    = {Takhanov, Rustem and Tezekbayev, Maxat and Pak, Artur and Bolatov, Arman and Assylbekov, Zhenisbek},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {117},
  doi       = {10.1007/S10994-025-06747-8},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/takhanov2025mlj-gradient/}
}