Scalable Second-Order Riemannian Optimization for $k$-Means Clustering
Abstract
Clustering is a hard discrete optimization problem. Nonconvex approaches such as low-rank semidefinite programming (SDP) have recently demonstrated promising statistical and local algorithmic guarantees for cluster recovery. Due to the combinatorial structure of the $K$-means clustering problem, current relaxation algorithms struggle to balance their constraint feasibility and objective optimality, presenting tremendous challenges in computing the second-order critical points with rigorous guarantees. In this paper, we provide a new formulation of the $K$-means problem as a smooth unconstrained optimization over a submanifold and characterize its Riemannian structures to allow it to be solved using a second-order cubic-regularized Riemannian Newton algorithm. By factorizing the $K$-means manifold into a product manifold, we show how each Newton subproblem can be solved in linear time. Our numerical experiments show that the proposed method converges significantly faster than the state-of-the-art first-order nonnegative low-rank factorization method, while achieving similarly optimal statistical accuracy.
Cite
Text
Xu et al. "Scalable Second-Order Riemannian Optimization for $k$-Means Clustering." International Conference on Learning Representations, 2026.Markdown
[Xu et al. "Scalable Second-Order Riemannian Optimization for $k$-Means Clustering." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/xu2026iclr-scalable/)BibTeX
@inproceedings{xu2026iclr-scalable,
title = {{Scalable Second-Order Riemannian Optimization for $k$-Means Clustering}},
author = {Xu, Peng and Hou, Chun Ying and Chen, Xiaohui and Zhang, Richard Y.},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/xu2026iclr-scalable/}
}