Effective String Processing and Matching for Author Disambiguation

Abstract

Track 2 of KDD Cup 2013 aims at determining duplicated authors in a data set from Microsoft Academic Search. This type of problems appears in many large-scale applications that compile information from different sources. This paper describes our solution developed at National Taiwan University to win the first prize of the competition. We propose an effective name matching framework and realize two implementations. An important strategy in our approach is to consider Chinese and non-Chinese names separately because of their different naming conventions. Post-processing including merging results of two predictions further boosts the performance. Our approach achieves F1-score 0.99202 on the private leader board, while 0.99195 on the public leader board.

Cite

Text

Chin et al. "Effective String Processing and Matching for Author Disambiguation." Journal of Machine Learning Research, 2014.

Markdown

[Chin et al. "Effective String Processing and Matching for Author Disambiguation." Journal of Machine Learning Research, 2014.](https://mlanthology.org/jmlr/2014/chin2014jmlr-effective/)

BibTeX

@article{chin2014jmlr-effective,
  title     = {{Effective String Processing and Matching for Author Disambiguation}},
  author    = {Chin, Wei-Sheng and Zhuang, Yong and Juan, Yu-Chin and Wu, Felix and Tung, Hsiao-Yu and Yu, Tong and Wang, Jui-Pin and Chang, Cheng-Xia and Yang, Chun-Pai and Chang, Wei-Cheng and Huang, Kuan-Hao and Kuo, Tzu-Ming and Lin, Shan-Wei and Lin, Young-San and Lu, Yu-Chen and Su, Yu-Chuan and Wei, Cheng-Kuang and Yin, Tu-Chun and Li, Chun-Liang and Lin, Ting-Wei and Tsai, Cheng-Hao and Lin, Shou-De and Lin, Hsuan-Tien and Lin, Chih-Jen},
  journal   = {Journal of Machine Learning Research},
  year      = {2014},
  pages     = {3037-3064},
  volume    = {15},
  url       = {https://mlanthology.org/jmlr/2014/chin2014jmlr-effective/}
}