From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes

Abstract

We focus on the classification problem with a separable dataset, one of the most important and classical problems from machine learning. The standard approach to this task is logistic regression with gradient descent (LR+GD). Recent studies have observed that LR+GD can find a solution with arbitrarily large step sizes, defying conventional optimization theory. Our work investigates this phenomenon and makes three interconnected key observations about LR+GD with large step sizes. First, we find a remarkably simple explanation of why LR+GD with large step sizes solves the classification problem: LR+GD reduces to a batch version of the celebrated perceptron algorithm when the step size tends to infinity. Second, we observe that larger step sizes lead LR+GD to higher logistic losses when it tends to the perceptron algorithm, but larger step sizes also lead to faster convergence to a solution for the classification problem, meaning that logistic loss is an unreliable metric of the proximity to a solution. Surprisingly, high loss values can actually indicate faster convergence. Third, since the convergence rate in terms of loss function values of LR+GD is unreliable, we examine the iteration complexity required by LR+GD with large step sizes to solve the classification problem and prove that this complexity is suboptimal. To address this, we propose a new method, Normalized LR+GD – based on the connection between LR+GD and the perceptron algorithm – with much better theoretical guarantees.

Cite

Text

Tyurin. "From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I20.35389

Markdown

[Tyurin. "From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/tyurin2025aaai-logistic/) doi:10.1609/AAAI.V39I20.35389

BibTeX

@inproceedings{tyurin2025aaai-logistic,
  title     = {{From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes}},
  author    = {Tyurin, Alexander},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {20938-20946},
  doi       = {10.1609/AAAI.V39I20.35389},
  url       = {https://mlanthology.org/aaai/2025/tyurin2025aaai-logistic/}
}