Segmenting Chinese Microtext: Joint Informal-Word Detection and Segmentation with Neural Networks

Abstract

State-of-the-art Chinese word segmentation systems typically exploit supervised modelstrained on a standard manually-annotated corpus, achieving performances over 95% on a similar standard testing corpus. However, the performances may drop significantly when the same models are applied onto Chinese microtext. One major challenge is the issue of informal words in the microtext. Previous studies show that informal word detection can be helpful for microtext processing. In this work, we investigate it under the neural setting, by proposing a joint segmentation model that integrates the detection of informal words simultaneously. In addition, we generate training corpus for the joint model by using existing corpus automatically. Experimental results show that the proposed model is highly effective for segmentation of Chinese microtext.

Cite

Text

Zhang et al. "Segmenting Chinese Microtext: Joint Informal-Word Detection and Segmentation with Neural Networks." International Joint Conference on Artificial Intelligence, 2017. doi:10.24963/IJCAI.2017/591

Markdown

[Zhang et al. "Segmenting Chinese Microtext: Joint Informal-Word Detection and Segmentation with Neural Networks." International Joint Conference on Artificial Intelligence, 2017.](https://mlanthology.org/ijcai/2017/zhang2017ijcai-segmenting/) doi:10.24963/IJCAI.2017/591

BibTeX

@inproceedings{zhang2017ijcai-segmenting,
  title     = {{Segmenting Chinese Microtext: Joint Informal-Word Detection and Segmentation with Neural Networks}},
  author    = {Zhang, Meishan and Fu, Guohong and Yu, Nan},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {4228-4234},
  doi       = {10.24963/IJCAI.2017/591},
  url       = {https://mlanthology.org/ijcai/2017/zhang2017ijcai-segmenting/}
}