Binary–Source Code Matching Based on Decompilation Techniques and Graph Analysis
Сохранить в:
| Опубликовано в:: | International Journal of Advanced Computer Science and Applications vol. 16, no. 5 (2025) |
|---|---|
| Главный автор: | |
| Опубликовано: |
Science and Information (SAI) Organization Limited
|
| Предметы: | |
| Online-ссылка: | Citation/Abstract Full Text - PDF |
| Метки: |
Нет меток, Требуется 1-ая метка записи!
|
| Краткий обзор: | Recent approaches to binary–source code matching often operate at the intermediate representation (IR) level, with some applying the matching process at the binary level by compiling the source code to binary and then matching it directly with the binary code. Others, though less common, perform matching at the decompiler-generated pseudo-code level by first decompiling the binary code into pseudo-code and then comparing it with the source code. However, all these approaches are limited by the loss of semantic information in the original source code and the introduction of noise during compilation and decompilation, making accurate matching challenging and often requiring specialized expertise. To address these limitations, this study introduces a system for binary–source code matching based on decompilation techniques and Graph analysis (BSMDG) that matches binary code with source code at the source code level. Our method utilizes the Ghidra decompiler in conjunction with a custom-built transpiler to reconstruct high-level C++ source code from binary executables. Subsequently, call graphs (CGs) and control flow graphs (CFGs) are generated for both the original and translated code to evaluate their structural and semantic similarities. To evaluate our system, we used a curated dataset of C++ source code and corresponding binary files collected from the AtCoder website for training and testing. Additionally, a case study was conducted using the widely recognized POJ-104 benchmark dataset to assess the system's generalizability. The results demonstrate the effectiveness of combining decompilation with graph-based analysis, with our system achieving 90% accuracy on POJ-104, highlighting its potential in code clone detection, vulnerability identification, and reverse engineering tasks. |
|---|---|
| ISSN: | 2158-107X 2156-5570 |
| DOI: | 10.14569/IJACSA.2025.0160525 |
| Источник: | Advanced Technologies & Aerospace Database |