Code Summarization: Do Transformers Really Understand Code?

Abstract

Recent approaches for automatic code summarization rely on fine-tuned transformer-based language Models often injected with program analysis information. We perform empirical studies to analyze the extent to which these models understand the code they attempt to summarize. We observe that these models rely heavily on the textual cues present in comments/function names/variable names and that masking this information negatively impacts the generated summaries. Further, subtle code transformations which drastically alter program logic have no corresponding impact on the generated summaries. Overall, the quality of the generated summaries even from state-of-the-art (SOTA) models is quite poor, raising questions about the utility of current approaches and datasets.

Cite

Text

Sontakke et al. "Code Summarization: Do Transformers Really Understand Code?." ICLR 2022 Workshops: DL4C, 2022.

Markdown

[Sontakke et al. "Code Summarization: Do Transformers Really Understand Code?." ICLR 2022 Workshops: DL4C, 2022.](https://mlanthology.org/iclrw/2022/sontakke2022iclrw-code/)

BibTeX

@inproceedings{sontakke2022iclrw-code,
  title     = {{Code Summarization: Do Transformers Really Understand Code?}},
  author    = {Sontakke, Ankita Nandkishor and Patwardhan, Manasi and Vig, Lovekesh and Medicherla, Raveendra Kumar and Naik, Ravindra and Shroff, Gautam},
  booktitle = {ICLR 2022 Workshops: DL4C},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/sontakke2022iclrw-code/}
}