From Pseudo-Code to Source Code: A Self-Supervised Search Approach

Abstract

Identifying algorithm implementations in source code is crucial for code comprehension, reference retrieval, and program synthesis. This paper presents PC2SC, a novel framework for mapping pseudo-code to source code without manual annotations. We introduce p-language, a structured representation that encodes control flow, mathematical expressions, and natural language descriptions of algorithms. A static analyzer extracts these features, converting pseudo-code into p-code, then embedded into a shared vector space with source code using self-supervised learning for retrieval. Given pseudo-code as input, PC2SC returns a ranked list of matching code snippets. Evaluations on the Stony Brook Algorithm Repository and GitHub projects demonstrate that PC2SC outperforms state-of-the-art code search tools in both C and Java. It successfully retrieves correct implementations within the top 25, 10, and 1 ranked results for 98.5\%, 93.8\%, and 66.2\% of queries, respectively. In GitHub projects, it identified 74 algorithm implementations out of 87 queries. PC2SC bridges the gap between algorithmic descriptions and executable implementations, offering a scalable, language-independent solution for algorithm retrieval and paving the way for future advancements in cross-language code search and automated synthesis.

Cite

Text

Kulkarni et al. "From Pseudo-Code to Source Code: A Self-Supervised Search Approach." ICLR 2025 Workshops: DL4C, 2025.

Markdown

[Kulkarni et al. "From Pseudo-Code to Source Code: A Self-Supervised Search Approach." ICLR 2025 Workshops: DL4C, 2025.](https://mlanthology.org/iclrw/2025/kulkarni2025iclrw-pseudocode/)

BibTeX

@inproceedings{kulkarni2025iclrw-pseudocode,
  title     = {{From Pseudo-Code to Source Code: A Self-Supervised Search Approach}},
  author    = {Kulkarni, Adithya and Chakraborty, Mohna and Sium, Yonas Afewerki and Valluri, Sai Charishma and Le, Wei and Li, Qi},
  booktitle = {ICLR 2025 Workshops: DL4C},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/kulkarni2025iclrw-pseudocode/}
}