ChatGPT-Generated Code Assignment Detection Using Perplexity of Large Language Models (Student Abstract)

Abstract

In the era of large language models like Chatgpt, maintaining academic integrity in programming education has become challenging due to potential misuse. There's a pressing need for reliable detectors to identify Chatgpt-generated code. While previous studies have tackled model-generated text detection, identifying such code remains uncharted territory. In this paper, we introduce a novel method to discern Chatgpt-generated code. We employ targeted masking perturbation, emphasizing code sections with high perplexity. Fine-tuned CodeBERT is utilized to replace these masked sections, generating subtly perturbed samples. Our scoring system amalgamates overall perplexity, variations in code line perplexity, and burstiness. In this scoring scheme, a higher rank for the original code suggests it's more likely to be chatgpt-generated. The underlying principle is that code generated by models typically exhibits consistent, low perplexity and reduced burstiness, with its ranking remaining relatively stable even after subtle modifications. In contrast, human-written code, when perturbed, is more likely to produce samples that the model prefers. Our approach significantly outperforms current detectors, especially against OpenAI's text-davinci-003 model, with the average AUC rising from 0.56 (GPTZero baseline) to 0.87.

Cite

Text

Xu et al. "ChatGPT-Generated Code Assignment Detection Using Perplexity of Large Language Models (Student Abstract)." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I21.30527

Markdown

[Xu et al. "ChatGPT-Generated Code Assignment Detection Using Perplexity of Large Language Models (Student Abstract)." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/xu2024aaai-chatgpt/) doi:10.1609/AAAI.V38I21.30527

BibTeX

@inproceedings{xu2024aaai-chatgpt,
  title     = {{ChatGPT-Generated Code Assignment Detection Using Perplexity of Large Language Models (Student Abstract)}},
  author    = {Xu, Zhenyu and Xu, Ruoyu and Sheng, Victor S.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {23688-23689},
  doi       = {10.1609/AAAI.V38I21.30527},
  url       = {https://mlanthology.org/aaai/2024/xu2024aaai-chatgpt/}
}