New Algorithms for Trace-Ratio Problem with Application to High-Dimension and Large-Sample Data Dimensionality Reduction

Abstract

Learning large-scale data sets with high dimensionality is a main concern in research areas including machine learning, visual recognition, information retrieval, to name a few. In many practical uses such as images, video, audio, and text processing, we have to face with high-dimension and large-sample data problems. The trace-ratio problem is a key problem for feature extraction and dimensionality reduction to circumvent the high dimensional space. However, it has been long believed that this problem has no closed-form solution, and one has to solve it by using some inner-outer iterative algorithms that are very time consuming. Therefore, efficient algorithms for high-dimension and large-sample trace-ratio problems are still lacking, especially for dense data problems. In this work, we present a closed-form solution for the trace-ratio problem, and propose two algorithms to solve it. Based on the formula and the randomized singular value decomposition, we first propose a randomized algorithm for solving high-dimension and large-sample dense trace-ratio problems. For high-dimension and large-sample sparse trace-ratio problems, we then propose an algorithm based on the closed-form solution and solving some consistent under-determined linear systems. Theoretical results are established to show the rationality and efficiency of the proposed methods. Numerical experiments are performed on some real-world data sets, which illustrate the superiority of the proposed algorithms over many state-of-the-art algorithms for high-dimension and large-sample dimensionality reduction problems.

Cite

Text

Shi and Wu. "New Algorithms for Trace-Ratio Problem with Application to High-Dimension and Large-Sample Data Dimensionality Reduction." Machine Learning, 2024. doi:10.1007/S10994-020-05937-W

Markdown

[Shi and Wu. "New Algorithms for Trace-Ratio Problem with Application to High-Dimension and Large-Sample Data Dimensionality Reduction." Machine Learning, 2024.](https://mlanthology.org/mlj/2024/shi2024mlj-new/) doi:10.1007/S10994-020-05937-W

BibTeX

@article{shi2024mlj-new,
  title     = {{New Algorithms for Trace-Ratio Problem with Application to High-Dimension and Large-Sample Data Dimensionality Reduction}},
  author    = {Shi, Wenya and Wu, Gang},
  journal   = {Machine Learning},
  year      = {2024},
  pages     = {3889-3916},
  doi       = {10.1007/S10994-020-05937-W},
  volume    = {113},
  url       = {https://mlanthology.org/mlj/2024/shi2024mlj-new/}
}