Binary–Source Code Matching Based on Decompilation Techniques and Graph Analysis

Na minha lista:

Detalhes bibliográficos
Publicado no:	International Journal of Advanced Computer Science and Applications vol. 16, no. 5 (2025)
Autor principal:	PDF
Publicado em:	Science and Information (SAI) Organization Limited
Assuntos:	King Abdulaziz University C++ (programming language) Datasets Semantics Source code Matching Binary codes Flow graphs Accuracy Computer science Reverse engineering C plus plus Cloning Graph representations Software engineering Case studies
Acesso em linha:	Citation/Abstract Full Text - PDF
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

MARC


LEADER	00000nab a2200000uu 4500
001	3222641067
003	UK-CbPIL
022			\|a 2158-107X
022			\|a 2156-5570
024	7		\|a 10.14569/IJACSA.2025.0160525 \|2 doi
035			\|a 3222641067
045	2		\|b d20250101 \|b d20251231
100	1		\|a PDF
245	1		\|a Binary–Source Code Matching Based on Decompilation Techniques and Graph Analysis
260			\|b Science and Information (SAI) Organization Limited \|c 2025
513			\|a Journal Article
520	3		\|a Recent approaches to binary–source code matching often operate at the intermediate representation (IR) level, with some applying the matching process at the binary level by compiling the source code to binary and then matching it directly with the binary code. Others, though less common, perform matching at the decompiler-generated pseudo-code level by first decompiling the binary code into pseudo-code and then comparing it with the source code. However, all these approaches are limited by the loss of semantic information in the original source code and the introduction of noise during compilation and decompilation, making accurate matching challenging and often requiring specialized expertise. To address these limitations, this study introduces a system for binary–source code matching based on decompilation techniques and Graph analysis (BSMDG) that matches binary code with source code at the source code level. Our method utilizes the Ghidra decompiler in conjunction with a custom-built transpiler to reconstruct high-level C++ source code from binary executables. Subsequently, call graphs (CGs) and control flow graphs (CFGs) are generated for both the original and translated code to evaluate their structural and semantic similarities. To evaluate our system, we used a curated dataset of C++ source code and corresponding binary files collected from the AtCoder website for training and testing. Additionally, a case study was conducted using the widely recognized POJ-104 benchmark dataset to assess the system's generalizability. The results demonstrate the effectiveness of combining decompilation with graph-based analysis, with our system achieving 90% accuracy on POJ-104, highlighting its potential in code clone detection, vulnerability identification, and reverse engineering tasks.
610		4	\|a King Abdulaziz University
653			\|a C++ (programming language)
653			\|a Datasets
653			\|a Semantics
653			\|a Source code
653			\|a Matching
653			\|a Binary codes
653			\|a Flow graphs
653			\|a Accuracy
653			\|a Computer science
653			\|a Reverse engineering
653			\|a C plus plus
653			\|a Cloning
653			\|a Graph representations
653			\|a Software engineering
653			\|a Case studies
773	0		\|t International Journal of Advanced Computer Science and Applications \|g vol. 16, no. 5 (2025)
786	0		\|d ProQuest \|t Advanced Technologies & Aerospace Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3222641067/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3222641067/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch