A Word Selection Method for Producing Interpretable Distributional Semantic Word Vectors

Pakzad, Atefe; Analoui, Morteza

doi:10.1613/JAIR.1.13353

A Word Selection Method for Producing Interpretable Distributional Semantic Word Vectors

Atefe Pakzad, Morteza Analoui

JAIR 2021 pp. 1281-1305

doi:10.1613/JAIR.1.13353 /jair/2021/pakzad2021jair-word/

Abstract

Distributional semantic models represent the meaning of words as vectors. We introduce a selection method to learn a vector space that each of its dimensions is a natural word. The selection method starts from the most frequent words and selects a subset, which has the best performance. The method produces a vector space that each of its dimensions is a word. This is the main advantage of the method compared to fusion methods such as NMF, and neural embedding models. We apply the method to the ukWaC corpus and train a vector space of N=1500 basis words. We report tests results on word similarity tasks for MEN, RG-65, SimLex-999, and WordSim353 gold datasets. Also, results show that reducing the number of basis vectors from 5000 to 1500 reduces accuracy by about 1.5-2%. So, we achieve good interpretability without a large penalty. Interpretability evaluation results indicate that the word vectors obtained by the proposed method using N=1500 are more interpretable than word embedding models, and the baseline method. We report the top 15 words of 1500 selected basis words in this paper.

PDF JAIR Semantic Scholar

Cite

Text

Pakzad and Analoui. "A Word Selection Method for Producing Interpretable Distributional Semantic Word Vectors." Journal of Artificial Intelligence Research, 2021. doi:10.1613/JAIR.1.13353

Markdown

[Pakzad and Analoui. "A Word Selection Method for Producing Interpretable Distributional Semantic Word Vectors." Journal of Artificial Intelligence Research, 2021.](https://mlanthology.org/jair/2021/pakzad2021jair-word/) doi:10.1613/JAIR.1.13353

BibTeX

@article{pakzad2021jair-word,
  title     = {{A Word Selection Method for Producing Interpretable Distributional Semantic Word Vectors}},
  author    = {Pakzad, Atefe and Analoui, Morteza},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2021},
  pages     = {1281-1305},
  doi       = {10.1613/JAIR.1.13353},
  volume    = {72},
  url       = {https://mlanthology.org/jair/2021/pakzad2021jair-word/}
}