Explainable Recommendation of Software Vulnerability Repair Based on Metadata Retrieval and Multifaceted LLMs

Guardat en:
Dades bibliogràfiques
Publicat a:Machine Learning and Knowledge Extraction vol. 7, no. 4 (2025), p. 149-180
Autor principal: Amoah, Alfred Asare
Altres autors: Liu, Yan
Publicat:
MDPI AG
Matèries:
Accés en línia:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3286316464
003 UK-CbPIL
022 |a 2504-4990 
024 7 |a 10.3390/make7040149  |2 doi 
035 |a 3286316464 
045 2 |b d20251001  |b d20251231 
100 1 |a Amoah, Alfred Asare 
245 1 |a Explainable Recommendation of Software Vulnerability Repair Based on Metadata Retrieval and Multifaceted LLMs 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Common Weakness Enumerations (CWEs) and Common Vulnerabilities and Exposures (CVEs) are open knowledge bases that provide definitions, descriptions, and samples of code vulnerabilities. The combination of Large Language Models (LLMs) with vulnerability knowledge bases helps to enhance and automate code vulnerability repair. Several key factors come into play in this setting, including (1) the retrieval of the most relevant context to a specific vulnerable code snippet; (2) augmenting LLM prompts with the retrieved context; and (3) the generated artifact form, such as a code repair with natural language explanations or a code repair only. Artifacts produced by these factors often lack transparency and explainability regarding the rationale behind the repair. In this paper, we propose an LLM-enabled framework for explainable recommendation of vulnerable code repairs with techniques addressing each factor. Our method is data-driven, which means the data characteristics of the selected CWE and CVE datasets and the knowledge base determine the best retrieval strategies. Across 100 experiments, we observe the inadequacy of the SOTA metrics to differentiate between low-quality and irrelevant repairs. To address this limitation, we design the LLM-as-a-Judge framework to enhance the robustness of recommendation assessments. Compared to baselines from prior works, as well as using static code analysis and LLMs in zero-shot, our findings highlight that multifaceted LLMs guided by retrieval context produce explainable and reliable recommendations under a small to mild level of self-alignment bias. Our work is developed on open-source knowledge bases and models, which makes it reproducible and extensible to new datasets and retrieval strategies. 
653 |a Software reliability 
653 |a Software 
653 |a Datasets 
653 |a Metadata 
653 |a Artifacts 
653 |a Deep learning 
653 |a Static code analysis 
653 |a Large language models 
653 |a Self alignment 
653 |a Knowledge 
653 |a Retrieval 
700 1 |a Liu, Yan 
773 0 |t Machine Learning and Knowledge Extraction  |g vol. 7, no. 4 (2025), p. 149-180 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3286316464/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3286316464/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3286316464/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch