TaxoRef: Embeddings Evaluation for AI-Driven Taxonomy Refinement
Abstract
Taxonomies provide a structured representation of semantic relations between lexical terms. In the case of standard official taxonomies, the refinement task consists of maintaining them updated over time, while preserving their original structure. To date, most of the approaches for automated taxonomy refinement rely on word vector models. However, none of them considers to what extent those models encode the taxonomic similarity between words. Motivated by this, we propose and implement TaxoRef , a methodology that (i) synthesises the semantic similarity between taxonomic elements through a new metric, namely HSS , (ii) evaluates to what extent the embeddings generated from a text corpus preserve those similarity relations and (iii) uses the best embedding resulted from this evaluation to perform taxonomy refinement. TaxoRef is a part of the research activity of a 4-year EU project that collects and classifies millions of Online Job Ads for the 27+1 EU countries. It has been tested over 2M ICT job ads classified over ESCO, the European standard occupation and skill taxonomy. Experimental results confirm (i) the HSS outperforms previous metrics for semantic similarity in taxonomies, and (ii) TaxoRef accurately encodes similarities among occupations, suggesting a refinement strategy.
Cite
Text
Malandri et al. "TaxoRef: Embeddings Evaluation for AI-Driven Taxonomy Refinement." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86523-8_37Markdown
[Malandri et al. "TaxoRef: Embeddings Evaluation for AI-Driven Taxonomy Refinement." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/malandri2021ecmlpkdd-taxoref/) doi:10.1007/978-3-030-86523-8_37BibTeX
@inproceedings{malandri2021ecmlpkdd-taxoref,
title = {{TaxoRef: Embeddings Evaluation for AI-Driven Taxonomy Refinement}},
author = {Malandri, Lorenzo and Mercorio, Fabio and Mezzanzanica, Mario and Nobani, Navid},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2021},
pages = {612-627},
doi = {10.1007/978-3-030-86523-8_37},
url = {https://mlanthology.org/ecmlpkdd/2021/malandri2021ecmlpkdd-taxoref/}
}