Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

Marcus J. Min, Yangruibo Ding, Luca Buratti, Saurabh Pujar, Gail Kaiser, Suman Jana, Baishakhi Ray

ICLR 2024

/iclr/2024/min2024iclr-beyond/

Abstract

Code Large Language Models (Code LLMs) are being increasingly employed in real-life applications, so evaluating them is critical. While the conventional accuracy evaluates the performance of Code LLMs on a set of individual tasks, their self-consistency across different tasks is overlooked. Intuitively, a trustworthy model should be self-consistent when generating natural language specifications for its own code and generating code for its own specifications. Failure to preserve self-consistency reveals a lack of understanding of the shared semantics underlying natural language and programming language, and therefore undermines the trustworthiness of a model. In this paper, we first formally define the self-consistency of Code LLMs and then design a framework, IdentityChain, which effectively and efficiently evaluates the self-consistency and conventional accuracy of a model at the same time. We study eleven Code LLMs and show that they fail to preserve self-consistency, which is indeed a distinct aspect from conventional accuracy. Furthermore, we show that IdentityChain can be used as a model debugging tool to expose weaknesses of Code LLMs by demonstrating three major weaknesses that we identify in current models using IdentityChain. Our code is available at https://github.com/marcusm117/IdentityChain.

PDF ICLR Semantic Scholar

Cite

Text

Min et al. "Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain." International Conference on Learning Representations, 2024.

Markdown

[Min et al. "Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/min2024iclr-beyond/)

BibTeX

@inproceedings{min2024iclr-beyond,
  title     = {{Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain}},
  author    = {Min, Marcus J. and Ding, Yangruibo and Buratti, Luca and Pujar, Saurabh and Kaiser, Gail and Jana, Suman and Ray, Baishakhi},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/min2024iclr-beyond/}
}