An Efficient Method to Estimate Labelled Sample Size for Transductive LDA(QDA/MDA) Based on Bayes Risk

Abstract

As semi-supervised classification drawing more attention, many practical semi-supervised learning methods have been proposed. However, one important issue was ignored by current literature–how to estimate the exact size of labelled samples given many unlabelled samples. Such an estimation method is important because of the rareness and expensiveness of labelled examples and is also crucial in exploring the relative value of labelled and unlabelled samples given a specific model. Based on the assumption of a latent gaussian-distribution to the domain, we described a method to estimate the number of labels required in a dataset for semi-supervised linear discriminant classifiers (Transductive LDA) to reach an desired accuracy. Our technique extends naturally to handle two difficult problems: learning from gaussian distributions with different covariances, and learning for multiple classes. This method is evaluated on two datasets, one toy dataset and one real-world wine dataset. The result of this research can be used in areas such text mining, information retrieval or bioinformatics.

Cite

Text

Liu et al. "An Efficient Method to Estimate Labelled Sample Size for Transductive LDA(QDA/MDA) Based on Bayes Risk." European Conference on Machine Learning, 2004. doi:10.1007/978-3-540-30115-8_27

Markdown

[Liu et al. "An Efficient Method to Estimate Labelled Sample Size for Transductive LDA(QDA/MDA) Based on Bayes Risk." European Conference on Machine Learning, 2004.](https://mlanthology.org/ecmlpkdd/2004/liu2004ecml-efficient/) doi:10.1007/978-3-540-30115-8_27

BibTeX

@inproceedings{liu2004ecml-efficient,
  title     = {{An Efficient Method to Estimate Labelled Sample Size for Transductive LDA(QDA/MDA) Based on Bayes Risk}},
  author    = {Liu, Han and Yuan, Xiaobin and Tang, Qianying and Kustra, Rafal},
  booktitle = {European Conference on Machine Learning},
  year      = {2004},
  pages     = {274-285},
  doi       = {10.1007/978-3-540-30115-8_27},
  url       = {https://mlanthology.org/ecmlpkdd/2004/liu2004ecml-efficient/}
}