Scalable Second-Order Riemannian Optimization for $k$-Means Clustering

Abstract

Clustering is a hard discrete optimization problem. Nonconvex approaches such as low-rank semidefinite programming (SDP) have recently demonstrated promising statistical and local algorithmic guarantees for cluster recovery. Due to the combinatorial structure of the $K$-means clustering problem, current relaxation algorithms struggle to balance their constraint feasibility and objective optimality, presenting tremendous challenges in computing the second-order critical points with rigorous guarantees. In this paper, we provide a new formulation of the $K$-means problem as a smooth unconstrained optimization over a submanifold and characterize its Riemannian structures to allow it to be solved using a second-order cubic-regularized Riemannian Newton algorithm. By factorizing the $K$-means manifold into a product manifold, we show how each Newton subproblem can be solved in linear time. Our numerical experiments show that the proposed method converges significantly faster than the state-of-the-art first-order nonnegative low-rank factorization method, while achieving similarly optimal statistical accuracy.

Cite

Text

Xu et al. "Scalable Second-Order Riemannian Optimization for $k$-Means Clustering." International Conference on Learning Representations, 2026.

Markdown

[Xu et al. "Scalable Second-Order Riemannian Optimization for $k$-Means Clustering." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/xu2026iclr-scalable/)

BibTeX

@inproceedings{xu2026iclr-scalable,
  title     = {{Scalable Second-Order Riemannian Optimization for $k$-Means Clustering}},
  author    = {Xu, Peng and Hou, Chun Ying and Chen, Xiaohui and Zhang, Richard Y.},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/xu2026iclr-scalable/}
}