Lexical Sememe Prediction via Word Embeddings and Matrix Factorization
Abstract
Sememes are defined as the minimum semantic units of human languages. People have manually annotated lexical sememes for words and form linguistic knowledge bases. However, manual construction is time-consuming and labor-intensive, with significant annotation inconsistency and noise. In this paper, we for the first time explore to automatically predict lexical sememes based on semantic meanings of words encoded by word embeddings. Moreover, we apply matrix factorization to learn semantic relations between sememes and words. In experiments, we take a real-world sememe knowledge base HowNet for training and evaluation, and the results reveal the effectiveness of our method for lexical sememe prediction. Our method will be of great use for annotation verification of existing noisy sememe knowledge bases and annotation suggestion of new words and phrases.
Cite
Text
Xie et al. "Lexical Sememe Prediction via Word Embeddings and Matrix Factorization." International Joint Conference on Artificial Intelligence, 2017. doi:10.24963/IJCAI.2017/587Markdown
[Xie et al. "Lexical Sememe Prediction via Word Embeddings and Matrix Factorization." International Joint Conference on Artificial Intelligence, 2017.](https://mlanthology.org/ijcai/2017/xie2017ijcai-lexical/) doi:10.24963/IJCAI.2017/587BibTeX
@inproceedings{xie2017ijcai-lexical,
title = {{Lexical Sememe Prediction via Word Embeddings and Matrix Factorization}},
author = {Xie, Ruobing and Yuan, Xingchi and Liu, Zhiyuan and Sun, Maosong},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2017},
pages = {4200-4206},
doi = {10.24963/IJCAI.2017/587},
url = {https://mlanthology.org/ijcai/2017/xie2017ijcai-lexical/}
}