An advanced computing approach for software vulnerability detection

Uloženo v:

Podrobná bibliografie
Vydáno v:	Multimedia Tools and Applications vol. 83, no. 39 (Nov 2024), p. 86707
Hlavní autor:	Do Xuan, Cho
Další autoři:	Cong, B. V.
Vydáno:	Springer Nature B.V.
Témata:	Software reliability Feature extraction Software Source code Vulnerability Natural language processing Synthesis
On-line přístup:	Citation/Abstract Full Text - PDF
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Abstrakt:	Detecting software vulnerabilities is a very urgent problem today. One of the common approaches for detecting software vulnerabilities is source code analysis. In this paper, to improve the effectiveness of the software vulnerability detection model based on source code analysis, we propose a novel model called GRD. The GRD model performs source code analysis to find and conclude about source code vulnerabilities based on a combination of two main methods: Feature Intelligent Extraction and Rebalancing Data. In particular, Feature Intelligent Extraction, which includes two models: deep graph networks and natural language processing (NLP) techniques, is responsible for synthesizing and extracting features of source code in the code property graph (CPG) form. Rebalancing Data has the function of balancing data to improve the efficiency of the source code classification task. The main characteristics of our proposal in this paper include two main phases as follows. The first phase extracts and synthesizes source code features into the CPG form. At this phase, the article proposes using Graph Convolution Network (GCN) to extract CPG features, and RoBERTa to extract source code snippets on the node of CPG. In the second phase, based on the feature vectors of the source code obtained in phase 1, the article proposes using the Dropout technique to generate data to balance among labels. Finally, the feature vectors generated after the Dropout technique are used to predict source code vulnerabilities. The study evaluates the proposed model on two common datasets: Verum and FFMQ. The experimental results in the article have shown the superiority of the proposed model compared to other approaches on all measures.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-024-19682-y
Zdroj:	ABI/INFORM Global