Radical-Level Ideograph Encoder for RNN-Based Sentiment Analysis of Chinese and Japanese

Abstract

The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90% smaller vocabulary, and at least 13% and 80% fewer parameters than the character embedding-based models and word embedding-based models respectively. The results suggest that the radical embeddingbased approach is cost-effective for machine learning on Chinese and Japanese.

Cite

Text

Ke and Hagiwara. "Radical-Level Ideograph Encoder for RNN-Based Sentiment Analysis of Chinese and Japanese." Proceedings of the Ninth Asian Conference on Machine Learning, 2017.

Markdown

[Ke and Hagiwara. "Radical-Level Ideograph Encoder for RNN-Based Sentiment Analysis of Chinese and Japanese." Proceedings of the Ninth Asian Conference on Machine Learning, 2017.](https://mlanthology.org/acml/2017/ke2017acml-radicallevel/)

BibTeX

@inproceedings{ke2017acml-radicallevel,
  title     = {{Radical-Level Ideograph Encoder for RNN-Based Sentiment Analysis of Chinese and Japanese}},
  author    = {Ke, Yuanzhi and Hagiwara, Masafumi},
  booktitle = {Proceedings of the Ninth Asian Conference on Machine Learning},
  year      = {2017},
  pages     = {561-573},
  volume    = {77},
  url       = {https://mlanthology.org/acml/2017/ke2017acml-radicallevel/}
}