Discovering Program Topoi Through Clustering

Abstract

Understanding source code of large open-source software projects is very challenging when there is only little documentation. New developers face the task of classifying a huge number of files and functions without any help. This paper documents a novel approach to this problem, called FEAT, that automatically extracts topoi from source code by using hierarchical agglomerative clustering. Program topoi summarize the main capabilities of a software system by presenting to developers clustered lists of functions together with an index of their relevant words. The clustering method used in FEAT exploits a new hybrid distance which combines both textual and structural elements automatically extracted from source code and comments. The experimental evaluation of FEAT shows that this approach is suitable to understand open-source software projects of size approaching 2,000 functions and 150 files, which opens the door for its deployment in the open-source community.

Cite

Text

Ieva et al. "Discovering Program Topoi Through Clustering." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.11405

Markdown

[Ieva et al. "Discovering Program Topoi Through Clustering." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/ieva2018aaai-discovering/) doi:10.1609/AAAI.V32I1.11405

BibTeX

@inproceedings{ieva2018aaai-discovering,
  title     = {{Discovering Program Topoi Through Clustering}},
  author    = {Ieva, Carlo and Gotlieb, Arnaud and Kaci, Souhila and Lazaar, Nadjib},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {7771-7778},
  doi       = {10.1609/AAAI.V32I1.11405},
  url       = {https://mlanthology.org/aaai/2018/ieva2018aaai-discovering/}
}