Approximate Conditional Gradient Descent on Multi-Class Classification

Abstract

Conditional gradient descent, aka the Frank-Wolfe algorithm,regains popularity in recent years. The key advantage of Frank-Wolfe is that at each step the expensive projection is replaced with a much more efficient linear optimization step. Similar to gradient descent, the loss function of Frank-Wolfe scales with the data size. Training on big data poses a challenge for researchers. Recently, stochastic Frank-Wolfe methods have been proposed to solve the problem, but they do not perform well in practice. In this work, we study the problem of approximating the Frank-Wolfe algorithm on the large-scale multi-class classification problem which is a typical application of the Frank-Wolfe algorithm. We present a simple but effective method employing internal structure of data to approximate Frank-Wolfe on the large-scale multiclass classification problem. Empirical results verify that our method outperforms the state-of-the-art stochastic projection free methods.

Cite

Text

Liu and Tsang. "Approximate Conditional Gradient Descent on Multi-Class Classification." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10915

Markdown

[Liu and Tsang. "Approximate Conditional Gradient Descent on Multi-Class Classification." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/liu2017aaai-approximate/) doi:10.1609/AAAI.V31I1.10915

BibTeX

@inproceedings{liu2017aaai-approximate,
  title     = {{Approximate Conditional Gradient Descent on Multi-Class Classification}},
  author    = {Liu, Zhuanghua and Tsang, Ivor W.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {2301-2307},
  doi       = {10.1609/AAAI.V31I1.10915},
  url       = {https://mlanthology.org/aaai/2017/liu2017aaai-approximate/}
}