Binary–Source Code Matching Based on Decompilation Techniques and Graph Analysis

Na minha lista:
Detalhes bibliográficos
Publicado no:International Journal of Advanced Computer Science and Applications vol. 16, no. 5 (2025)
Autor principal: PDF
Publicado em:
Science and Information (SAI) Organization Limited
Assuntos:
Acesso em linha:Citation/Abstract
Full Text - PDF
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!

MARC

LEADER 00000nab a2200000uu 4500
001 3222641067
003 UK-CbPIL
022 |a 2158-107X 
022 |a 2156-5570 
024 7 |a 10.14569/IJACSA.2025.0160525  |2 doi 
035 |a 3222641067 
045 2 |b d20250101  |b d20251231 
100 1 |a PDF 
245 1 |a Binary–Source Code Matching Based on Decompilation Techniques and Graph Analysis 
260 |b Science and Information (SAI) Organization Limited  |c 2025 
513 |a Journal Article 
520 3 |a Recent approaches to binary–source code matching often operate at the intermediate representation (IR) level, with some applying the matching process at the binary level by compiling the source code to binary and then matching it directly with the binary code. Others, though less common, perform matching at the decompiler-generated pseudo-code level by first decompiling the binary code into pseudo-code and then comparing it with the source code. However, all these approaches are limited by the loss of semantic information in the original source code and the introduction of noise during compilation and decompilation, making accurate matching challenging and often requiring specialized expertise. To address these limitations, this study introduces a system for binary–source code matching based on decompilation techniques and Graph analysis (BSMDG) that matches binary code with source code at the source code level. Our method utilizes the Ghidra decompiler in conjunction with a custom-built transpiler to reconstruct high-level C++ source code from binary executables. Subsequently, call graphs (CGs) and control flow graphs (CFGs) are generated for both the original and translated code to evaluate their structural and semantic similarities. To evaluate our system, we used a curated dataset of C++ source code and corresponding binary files collected from the AtCoder website for training and testing. Additionally, a case study was conducted using the widely recognized POJ-104 benchmark dataset to assess the system's generalizability. The results demonstrate the effectiveness of combining decompilation with graph-based analysis, with our system achieving 90% accuracy on POJ-104, highlighting its potential in code clone detection, vulnerability identification, and reverse engineering tasks. 
610 4 |a King Abdulaziz University 
653 |a C++ (programming language) 
653 |a Datasets 
653 |a Semantics 
653 |a Source code 
653 |a Matching 
653 |a Binary codes 
653 |a Flow graphs 
653 |a Accuracy 
653 |a Computer science 
653 |a Reverse engineering 
653 |a C plus plus 
653 |a Cloning 
653 |a Graph representations 
653 |a Software engineering 
653 |a Case studies 
773 0 |t International Journal of Advanced Computer Science and Applications  |g vol. 16, no. 5 (2025) 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3222641067/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3222641067/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch