Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

Abstract

Some of the latest released Code Large Language Models (Code LLMs) have been trained on repository-level code data, enabling them to perceive repository structures and utilize cross-file code information. This capability allows us to directly concatenate the content of repository code files in prompts to achieve repository-level code completion. However, in real development scenarios, directly concatenating all code repository files in a prompt can easily exceed the context window of Code LLMs, leading to a significant decline in completion performance. Additionally, overly long prompts can increase completion latency, negatively impacting the user experience. In this study, we conducted extensive experiments, including completion error analysis, topology dependency analysis, and cross-file content analysis, to investigate the factors affecting repository-level code completion. Based on the conclusions drawn from these preliminary experiments, we proposed a strategy called **Hierarchical Context Pruning (HCP)** to construct high-quality completion prompts. We applied the **HCP** to six Code LLMs and evaluated them on the CrossCodeEval dataset. The experimental results showed that, compared to previous methods, the prompts constructed using our **HCP** strategy achieved higher completion accuracy on five out of six Code LLMs. Additionally, the **HCP** managed to keep the prompt length around 8k tokens (whereas the full repository code is approximately 50k tokens), significantly improving completion throughput. Our code and data will be publicly available.

Cite

Text

Zhang et al. "Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I24.34782

Markdown

[Zhang et al. "Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhang2025aaai-hierarchical/) doi:10.1609/AAAI.V39I24.34782

BibTeX

@inproceedings{zhang2025aaai-hierarchical,
  title     = {{Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs}},
  author    = {Zhang, Lei and Li, Yunshui and Li, Jiaming and Xia, Xiaobo and Yang, Jiaxi and Luo, Run and Wang, Minzheng and Chen, Longze and Liu, Junhao and Qu, Qiang and Yang, Min},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {25886-25894},
  doi       = {10.1609/AAAI.V39I24.34782},
  url       = {https://mlanthology.org/aaai/2025/zhang2025aaai-hierarchical/}
}