Gated POS-Level Language Model for Authorship Verification
Abstract
Authorship verification is an important problem that has many applications. The state-of-the-art deep authorship verification methods typically leverage character-level language models to encode author-specific writing styles. However, they often fail to capture syntactic level patterns, leading to sub-optimal accuracy in cross-topic scenarios. Also, due to imperfect cross-author parameter sharing, it's difficult for them to distinguish author-specific writing style from common patterns, leading to data-inefficient learning. This paper introduces a novel POS-level (Part of Speech) gated RNN based language model to effectively learn the author-specific syntactic styles. The author-agnostic syntactic information obtained from the POS tagger pre-trained on large external datasets greatly reduces the number of effective parameters of our model, enabling the model to learn accurate author-specific syntactic styles with limited training data. We also utilize a gated architecture to learn the common syntactic writing styles with a small set of shared parameters and let the author-specific parameters focus on each author's special syntactic styles. Extensive experimental results show that our method achieves significantly better accuracy than state-of-the-art competing methods, especially in cross-topic scenarios (over 5\% in terms of AUC-ROC).
Cite
Text
Ouyang et al. "Gated POS-Level Language Model for Authorship Verification." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/557Markdown
[Ouyang et al. "Gated POS-Level Language Model for Authorship Verification." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/ouyang2020ijcai-gated/) doi:10.24963/IJCAI.2020/557BibTeX
@inproceedings{ouyang2020ijcai-gated,
title = {{Gated POS-Level Language Model for Authorship Verification}},
author = {Ouyang, Linshu and Zhang, Yongzheng and Liu, Hui and Chen, Yige and Wang, Yipeng},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2020},
pages = {4025-4031},
doi = {10.24963/IJCAI.2020/557},
url = {https://mlanthology.org/ijcai/2020/ouyang2020ijcai-gated/}
}