fBGD: Learning Embeddings from Positive Unlabeled Data with BGD

Abstract

Learning sparse features from only positive and unlabeled (PU) data is a fundamental task for problems of several domains, such as natural language processing (NLP), computer vision (CV), information retrieval (IR). Considering the numerous amount of unlabeled data, most prevalent methods rely on negative sampling (NS) to increase computational efficiency. However, sampling a fraction of unlabeled data as negative for training may ignore other important examples, and thus lead to non-optimal prediction performance. To address this, we present a fast and generic batch gradient descent optimizer ($f_{BGD}$) to learn from all training examples without sampling. By leveraging sparsity in PU data, we accelerate $f_{BGD}$ by several magnitudes, making its time complexity the same level as the NS-based stochastic gradient descent method. Meanwhile, we observe that the standard batch gradient method suffers from gradient instability issues due to the sparsity property. Driven by a theoretical analysis for this potential cause, an intuitive solution arises naturally. To verify its efficacy, we perform experiments on multiple tasks with PU data across domains, and show that $f_{BGD}$ consistently outperforms NS-based models on all tasks with comparable efficiency.

Cite

Text

Yuan et al. "fBGD: Learning Embeddings from Positive Unlabeled Data with BGD." Conference on Uncertainty in Artificial Intelligence, 2018.

Markdown

[Yuan et al. "fBGD: Learning Embeddings from Positive Unlabeled Data with BGD." Conference on Uncertainty in Artificial Intelligence, 2018.](https://mlanthology.org/uai/2018/yuan2018uai-fbgd/)

BibTeX

@inproceedings{yuan2018uai-fbgd,
  title     = {{fBGD: Learning Embeddings from Positive Unlabeled Data with BGD}},
  author    = {Yuan, Fajie and Xin, Xin and He, Xiangnan and Guo, Guibing and Zhang, Weinan and Chua, Tat-Seng and Jose, Joemon M.},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2018},
  pages     = {198-207},
  url       = {https://mlanthology.org/uai/2018/yuan2018uai-fbgd/}
}