Text Genre Classification Based on Linguistic Complexity Contours Using a Recurrent Neural Network
Abstract
Over the last years, there has been an increased interest in the combined use of natural language processing techniques and machine learning algorithms to automatically classify texts on the basis of wide range of features. One class of features that have been successfully employed for a wide range of classification tasks, including native language identification, readability assessment and text genre categorization pertain to the construct of ‘linguistic complexity’. This paper presents a novel approach to the use of linguistic complexity features in text categorization: Rather than representing text complexity ‘globally’ in terms of summary statistics, this approach assesses text complexity ‘locally’ and captures the progression of complexity within a text as a sequence of complexity scores, generating what is referred to here as ‘complexity contours’. We demonstrate the utility of the approach in an automatic text classification task for five genres – academic, newspaper, fiction, magazine and spoken – of the Corpus of Contemporary American English (COCA) [Davies, 2008] using a recurrent neural network.
Cite
Text
Ströbel et al. "Text Genre Classification Based on Linguistic Complexity Contours Using a Recurrent Neural Network." International Joint Conference on Artificial Intelligence, 2018.Markdown
[Ströbel et al. "Text Genre Classification Based on Linguistic Complexity Contours Using a Recurrent Neural Network." International Joint Conference on Artificial Intelligence, 2018.](https://mlanthology.org/ijcai/2018/strobel2018ijcai-text/)BibTeX
@inproceedings{strobel2018ijcai-text,
title = {{Text Genre Classification Based on Linguistic Complexity Contours Using a Recurrent Neural Network}},
author = {Ströbel, Marcus and Kerz, Elma and Wiechmann, Daniel and Qiao, Yu},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2018},
pages = {56-63},
url = {https://mlanthology.org/ijcai/2018/strobel2018ijcai-text/}
}