CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

Qin, Zeqing; Wu, Yiwei; Han, Lansheng

doi:10.1609/AAAI.V39I23.34689

CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

Zeqing Qin, Yiwei Wu, Lansheng Han

AAAI 2025 pp. 25047-25055

doi:10.1609/AAAI.V39I23.34689 /aaai/2025/qin2025aaai-clnx/

Abstract

Large Language Models (LLMs) have shown great promise in vulnerability identification. As C/C++ comprise half of the open-source Software (OSS) vulnerabilities over the past decade and updates in OSS mainly occur through commits, enhancing LLMs' ability to identify C/C++ Vulnerability-Contributing Commits (VCCs) is essential. However, current studies primarily focus on further pre-training LLMs on massive code datasets, which is resource-intensive and poses efficiency challenges. In this paper, we enhance the ability of BERT-based LLMs to identify C/C++ VCCs in a lightweight manner. We propose CodeLinguaNexus (CLNX) as a bridge facilitating communication between C/C++ programs and LLMs. Based on commits, CLNX efficiently converts the source code into a more natural representation while preserving key details. Specifically, CLNX first applies Structure-level Naturalization to decompose complex programs, followed by Token-level Naturalization to interpret complex symbols. We evaluate CLNX on public datasets of 25,872 C/C++ functions with their commits. The results demonstrate that CLNX substantially improves the ability of LLMs to detect C/C++ VCCs. Moreover, CLNX-equipped CodeBERT achieves new state-of-the-art performance and identifies 38 OSS vulnerabilities in the real world.

PDF AAAI Semantic Scholar

Cite

Text

Qin et al. "CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I23.34689

Markdown

[Qin et al. "CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/qin2025aaai-clnx/) doi:10.1609/AAAI.V39I23.34689

BibTeX

@inproceedings{qin2025aaai-clnx,
  title     = {{CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification}},
  author    = {Qin, Zeqing and Wu, Yiwei and Han, Lansheng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {25047-25055},
  doi       = {10.1609/AAAI.V39I23.34689},
  url       = {https://mlanthology.org/aaai/2025/qin2025aaai-clnx/}
}