Code Summarization with Project-Specific Features

Abstract

Code summarization aims to automatically generate natural language descriptions for code snippets, which help people maintain and understand code snippets. Existing code summarization methods are mostly based on the encoder-decoder structure, where the encoder learns latent features from a code snippet and the decoder generates the corresponding summary based on the features. Such methods do not leverage project-specific information and tend to generate general summaries. However, in practice developers want the generated summaries to be project-specific, i.e., being consistent with the existing summaries in the same project on aspects such as sentence patterns and domain concepts. In this work, we investigate project-specific code summarization. We propose a two-stage method CSWPS, which can be seamlessly integrated into any existing encoder-decoder summarization model. In the first stage, CSWPS learns project-specific features from existing summaries in each project using multi-task learning. In the second stage, CSWPS samples from the project-specific features conditioned on the input source code and project information, and extracts the features most relevant to the input code. The features guide the decoder to generate a project-specific summary for the input code. By incorporating CSWPS into existing code summarization models, we can always improve their performance and achieve the new state-of-the-art. We also empirically show that the summaries generated by incorporating CSWPS are more project-specific, via feature visualization and human study. A replication package for this work is available at https://github.com/DaSESmartEdu/CSWPS .

Cite

Text

Wang et al. "Code Summarization with Project-Specific Features." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70378-2_12

Markdown

[Wang et al. "Code Summarization with Project-Specific Features." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/wang2024ecmlpkdd-code/) doi:10.1007/978-3-031-70378-2_12

BibTeX

@inproceedings{wang2024ecmlpkdd-code,
  title     = {{Code Summarization with Project-Specific Features}},
  author    = {Wang, Yu and Liu, Xin and Lu, Xuesong and Zhou, Aoying},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2024},
  pages     = {190-206},
  doi       = {10.1007/978-3-031-70378-2_12},
  url       = {https://mlanthology.org/ecmlpkdd/2024/wang2024ecmlpkdd-code/}
}