From Pseudo-Code to Source Code: A Self-Supervised Search Approach
Abstract
Identifying algorithm implementations in source code is crucial for code comprehension, reference retrieval, and program synthesis. This paper presents PC2SC, a novel framework for mapping pseudo-code to source code without manual annotations. We introduce p-language, a structured representation that encodes control flow, mathematical expressions, and natural language descriptions of algorithms. A static analyzer extracts these features, converting pseudo-code into p-code, then embedded into a shared vector space with source code using self-supervised learning for retrieval. Given pseudo-code as input, PC2SC returns a ranked list of matching code snippets. Evaluations on the Stony Brook Algorithm Repository and GitHub projects demonstrate that PC2SC outperforms state-of-the-art code search tools in both C and Java. It successfully retrieves correct implementations within the top 25, 10, and 1 ranked results for 98.5\%, 93.8\%, and 66.2\% of queries, respectively. In GitHub projects, it identified 74 algorithm implementations out of 87 queries. PC2SC bridges the gap between algorithmic descriptions and executable implementations, offering a scalable, language-independent solution for algorithm retrieval and paving the way for future advancements in cross-language code search and automated synthesis.
Cite
Text
Kulkarni et al. "From Pseudo-Code to Source Code: A Self-Supervised Search Approach." ICLR 2025 Workshops: DL4C, 2025.Markdown
[Kulkarni et al. "From Pseudo-Code to Source Code: A Self-Supervised Search Approach." ICLR 2025 Workshops: DL4C, 2025.](https://mlanthology.org/iclrw/2025/kulkarni2025iclrw-pseudocode/)BibTeX
@inproceedings{kulkarni2025iclrw-pseudocode,
title = {{From Pseudo-Code to Source Code: A Self-Supervised Search Approach}},
author = {Kulkarni, Adithya and Chakraborty, Mohna and Sium, Yonas Afewerki and Valluri, Sai Charishma and Le, Wei and Li, Qi},
booktitle = {ICLR 2025 Workshops: DL4C},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/kulkarni2025iclrw-pseudocode/}
}