Explainable Recommendation of Software Vulnerability Repair Based on Metadata Retrieval and Multifaceted LLMs

Guardat en:

Dades bibliogràfiques
Publicat a:	Machine Learning and Knowledge Extraction vol. 7, no. 4 (2025), p. 149-180
Autor principal:	Amoah, Alfred Asare
Altres autors:	Liu, Yan
Publicat:	MDPI AG
Matèries:	Software reliability Software Datasets Metadata Artifacts Deep learning Static code analysis Large language models Self alignment Knowledge Retrieval
Accés en línia:	Citation/Abstract Full Text + Graphics Full Text - PDF
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC


LEADER	00000nab a2200000uu 4500
001	3286316464
003	UK-CbPIL
022			\|a 2504-4990
024	7		\|a 10.3390/make7040149 \|2 doi
035			\|a 3286316464
045	2		\|b d20251001 \|b d20251231
100	1		\|a Amoah, Alfred Asare
245	1		\|a Explainable Recommendation of Software Vulnerability Repair Based on Metadata Retrieval and Multifaceted LLMs
260			\|b MDPI AG \|c 2025
513			\|a Journal Article
520	3		\|a Common Weakness Enumerations (CWEs) and Common Vulnerabilities and Exposures (CVEs) are open knowledge bases that provide definitions, descriptions, and samples of code vulnerabilities. The combination of Large Language Models (LLMs) with vulnerability knowledge bases helps to enhance and automate code vulnerability repair. Several key factors come into play in this setting, including (1) the retrieval of the most relevant context to a specific vulnerable code snippet; (2) augmenting LLM prompts with the retrieved context; and (3) the generated artifact form, such as a code repair with natural language explanations or a code repair only. Artifacts produced by these factors often lack transparency and explainability regarding the rationale behind the repair. In this paper, we propose an LLM-enabled framework for explainable recommendation of vulnerable code repairs with techniques addressing each factor. Our method is data-driven, which means the data characteristics of the selected CWE and CVE datasets and the knowledge base determine the best retrieval strategies. Across 100 experiments, we observe the inadequacy of the SOTA metrics to differentiate between low-quality and irrelevant repairs. To address this limitation, we design the LLM-as-a-Judge framework to enhance the robustness of recommendation assessments. Compared to baselines from prior works, as well as using static code analysis and LLMs in zero-shot, our findings highlight that multifaceted LLMs guided by retrieval context produce explainable and reliable recommendations under a small to mild level of self-alignment bias. Our work is developed on open-source knowledge bases and models, which makes it reproducible and extensible to new datasets and retrieval strategies.
653			\|a Software reliability
653			\|a Software
653			\|a Datasets
653			\|a Metadata
653			\|a Artifacts
653			\|a Deep learning
653			\|a Static code analysis
653			\|a Large language models
653			\|a Self alignment
653			\|a Knowledge
653			\|a Retrieval
700	1		\|a Liu, Yan
773	0		\|t Machine Learning and Knowledge Extraction \|g vol. 7, no. 4 (2025), p. 149-180
786	0		\|d ProQuest \|t Advanced Technologies & Aerospace Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3286316464/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full Text + Graphics \|u https://www.proquest.com/docview/3286316464/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3286316464/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch