On Building an Interpretable Topic Modeling Approach for the Urdu Language

Nasim, Zarmeen

doi:10.24963/IJCAI.2020/740

On Building an Interpretable Topic Modeling Approach for the Urdu Language

Zarmeen Nasim

IJCAI 2020 pp. 5200-5201

doi:10.24963/IJCAI.2020/740 /ijcai/2020/nasim2020ijcai-building/

Abstract

This research is an endeavor to combine deep-learning-based language modeling with classical topic modeling techniques to produce interpretable topics for a given set of documents in Urdu, a low resource language. The existing topic modeling techniques produce a collection of words, often un-interpretable, as suggested topics without integrat-ing them into a semantically correct phrase/sentence. The proposed approach would first build an accurate Part of Speech (POS) tagger for the Urdu Language using a publicly available corpus of many million sentences. Using semanti-cally rich feature extraction approaches including Word2Vec and BERT, the proposed approach, in the next step, would experiment with different clus-tering and topic modeling techniques to produce a list of potential topics for a given set of documents. Finally, this list of topics would be sent to a labeler module to produce syntactically correct phrases that will represent interpretable topics.

PDF IJCAI Semantic Scholar

Cite

Text

Nasim. "On Building an Interpretable Topic Modeling Approach for the Urdu Language." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/740

Markdown

[Nasim. "On Building an Interpretable Topic Modeling Approach for the Urdu Language." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/nasim2020ijcai-building/) doi:10.24963/IJCAI.2020/740

BibTeX

@inproceedings{nasim2020ijcai-building,
  title     = {{On Building an Interpretable Topic Modeling Approach for the Urdu Language}},
  author    = {Nasim, Zarmeen},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {5200-5201},
  doi       = {10.24963/IJCAI.2020/740},
  url       = {https://mlanthology.org/ijcai/2020/nasim2020ijcai-building/}
}