CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph

Furkejuvvon:

Bibliográfalaš dieđut
Publikašuvnnas:	arXiv.org (Dec 20, 2024), p. n/a
Váldodahkki:	Xu, Hanxiang
Eará dahkkit:	Ma, Wei, Zhou, Ting, Zhao, Yanjie, Chen, Kai, Hu, Qiang, Liu, Yang, Wang, Haoyu
Almmustuhtton:	Cornell University Library, arXiv.org
Fáttát:	Source code Graphs Intelligent agents Large language models Software reliability Open source software Graph theory Debugging Program verification (computers) Knowledge representation Software testing
Liŋkkat:	Citation/Abstract Full text outside of ProQuest
Fáddágilkorat:	Lasit fáddágilkoriid Eai fáddágilkorat, Lasit vuosttaš fáddágilkora!

MARC


LEADER	00000nab a2200000uu 4500
001	3130503919
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 3130503919
045	0		\|b d20241220
100	1		\|a Xu, Hanxiang
245	1		\|a CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph
260			\|b Cornell University Library, arXiv.org \|c Dec 20, 2024
513			\|a Working Paper
520	3		\|a In recent years, the programming capabilities of large language models (LLMs) have garnered significant attention. Fuzz testing, a highly effective technique, plays a key role in enhancing software reliability and detecting vulnerabilities. However, traditional fuzz testing tools rely on manually crafted fuzz drivers, which can limit both testing efficiency and effectiveness. To address this challenge, we propose an automated fuzz testing method driven by a code knowledge graph and powered by an LLM-based intelligent agent system, referred to as CKGFuzzer. We approach fuzz driver creation as a code generation task, leveraging the knowledge graph of the code repository to automate the generation process within the fuzzing loop, while continuously refining both the fuzz driver and input seeds. The code knowledge graph is constructed through interprocedural program analysis, where each node in the graph represents a code entity, such as a function or a file. The knowledge graph-enhanced CKGFuzzer not only effectively resolves compilation errors in fuzz drivers and generates input seeds tailored to specific API usage scenarios, but also analyzes fuzz driver crash reports, assisting developers in improving code quality. By querying the knowledge graph of the code repository and learning from API usage scenarios, we can better identify testing targets and understand the specific purpose of each fuzz driver. We evaluated our approach using eight open-source software projects. The experimental results indicate that CKGFuzzer achieved an average improvement of 8.73% in code coverage compared to state-of-the-art techniques. Additionally, CKGFuzzer reduced the manual review workload in crash case analysis by 84.4% and successfully detected 11 real bugs (including nine previously unreported bugs) across the tested libraries.
653			\|a Source code
653			\|a Graphs
653			\|a Intelligent agents
653			\|a Large language models
653			\|a Software reliability
653			\|a Open source software
653			\|a Graph theory
653			\|a Debugging
653			\|a Program verification (computers)
653			\|a Knowledge representation
653			\|a Software testing
700	1		\|a Ma, Wei
700	1		\|a Zhou, Ting
700	1		\|a Zhao, Yanjie
700	1		\|a Chen, Kai
700	1		\|a Hu, Qiang
700	1		\|a Liu, Yang
700	1		\|a Wang, Haoyu
773	0		\|t arXiv.org \|g (Dec 20, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3130503919/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2411.11532