AI-Assisted Software Analysis and Code Repair

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2024)
Autor principal: He, Xu
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:Program analysis and code repair are fundamental to uncovering and patching vulnerabilities. However, achieving precision remains challenging when analyzing large-size and intricate software. An inevitable trade-off exists between software complexity and the scalability of analytical tools, causing traditional program analysis to suffer from efficiency or false alarm issues. With artificial intelligence (AI) becoming increasingly popular, applying AI techniques to tackle such complex issues has attracted significant attention. Consequently, a key research question has emerged: how can AI’s exceptional understanding capabilities in natural language and images be transferred to programming languages? In this dissertation, I explore three paradigms that combine AI and program analysis techniques to boost performance in distinct analysis and patching tasks.First, we present BinProv, a compilation provenance identification tool that reveals the impact of compilation on binary code. Compilation provenance refers to the compiler and optimization levels, which hinder analysts from intuitively comprehending the encoding semantics in binary code. Traditional methods rely on reverse engineering (e.g., disassembling or decompilation) to restore code semantics; however, these methods are often inaccurate. To address this, BinProv leverages embedding models to learn the program encoding semantics and avoid the inaccurate disassembling process.Second, we introduce BinGo, a graph-learning-based security patch identification tool that distinguishes security patches in binary code. The security patch is critical to fixing vulnerabilities, but it is hidden in massive, silently released code changes. Traditional program analysis usually involves extracting control flow and data flow graphs to abstract the program’s structure. We use the graph model to delve deeper into the program’s structural semantics. The graph model can understand code functionality by learning from multiple graphs at scale.Third, we propose PathFix, an automated program repair solution that combines the static-analysis-based framework and the large language model (LLM) to understand the comprehensive program semantics and repair the buggy program. Automated Program Repair (APR) is a compound task requiring multiple steps, such as precise correctness specification and patch synthesis, which are challenging to achieve using only traditional program analysis or LLMs separately. Traditional methods have static-analysis-based repair workflow but limited scalability. Meanwhile, LLM can offer promising performance on code behavior summarization and code generation tasks. Therefore, we employ a static-analysis-based framework to decompose steps and guide LLM in finding feasible repair paths and generating appropriate patches.Through extensive experiments, we observe that these AI-assisted solutions effectively improve performance in the three tasks. They also pave the way for further exploration of combining AI and program analysis techniques to enhance software security.
ISBN:9798302829696
Fuente:ProQuest Dissertations & Theses Global