A Semi-Discriminative Approach for Sub-Sentence Level Topic Classification on a Small Dataset

Abstract

This paper aims at identifying sequences of words related to specific product components in online product reviews. A reliable baseline performance for this topic classification problem is given by a Max Entropy classifier which assumes independence over subsequent topics. However, the reviews exhibit an inherent structure on the document level allowing to frame the task as sequence classification problem. Since more flexible models from the class of Conditional Random Fields were not competitive because of the limited amount of training data available, we propose using a Hidden Markov Model instead and decouple the training of transition and emission probabilities. The discriminating power of the Max Entropy approach is used for the latter. Besides outperforming both standalone methods as well as more generic models such as linear-chain Conditional Random Fields, the combined classifier is able to assign topics on sub-sentence level although labeling in the training data is only available on sentence level.

Cite

Text

Ferner and Wegenkittl. "A Semi-Discriminative Approach for Sub-Sentence Level Topic Classification on a Small Dataset." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019. doi:10.1007/978-3-030-46147-8_42

Markdown

[Ferner and Wegenkittl. "A Semi-Discriminative Approach for Sub-Sentence Level Topic Classification on a Small Dataset." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019.](https://mlanthology.org/ecmlpkdd/2019/ferner2019ecmlpkdd-semidiscriminative/) doi:10.1007/978-3-030-46147-8_42

BibTeX

@inproceedings{ferner2019ecmlpkdd-semidiscriminative,
  title     = {{A Semi-Discriminative Approach for Sub-Sentence Level Topic Classification on a Small Dataset}},
  author    = {Ferner, Cornelia and Wegenkittl, Stefan},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2019},
  pages     = {697-710},
  doi       = {10.1007/978-3-030-46147-8_42},
  url       = {https://mlanthology.org/ecmlpkdd/2019/ferner2019ecmlpkdd-semidiscriminative/}
}