LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis

Guardat en:

Dades bibliogràfiques
Publicat a:	arXiv.org (Dec 18, 2024), p. n/a
Autor principal:	Wang, Chengpeng
Altres autors:	Gao, Yifei, Zhang, Wuqi, Liu, Xuwei, Shi, Qingkai, Zhang, Xiangyu
Publicat:	Cornell University Library, arXiv.org
Matèries:	Performance enhancement Semantics Misalignment Hallucinations Prompt engineering Static code analysis Performance evaluation Large language models Natural language processing Decomposition
Accés en línia:	Citation/Abstract Full text outside of ProQuest
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC


LEADER	00000nab a2200000uu 4500
001	3147567323
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 3147567323
045	0		\|b d20241218
100	1		\|a Wang, Chengpeng
245	1		\|a LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis
260			\|b Cornell University Library, arXiv.org \|c Dec 18, 2024
513			\|a Working Paper
520	3		\|a Static analysis is essential for program optimization, bug detection, and debugging, but its reliance on compilation and limited customization hampers practical use. Advances in LLMs enable a new paradigm of compilation-free, customizable analysis via prompting. LLMs excel in interpreting program semantics on small code snippets and allow users to define analysis tasks in natural language with few-shot examples. However, misalignment with program semantics can cause hallucinations, especially in sophisticated semantic analysis upon lengthy code snippets. We propose LLMSA, a compositional neuro-symbolic approach for compilation-free, customizable static analysis with reduced hallucinations. Specifically, we propose an analysis policy language to support users decomposing an analysis problem into several sub-problems that target simple syntactic or semantic properties upon smaller code snippets. The problem decomposition enables the LLMs to target more manageable semantic-related sub-problems, while the syntactic ones are resolved by parsing-based analysis without hallucinations. An analysis policy is evaluated with lazy, incremental, and parallel prompting, which mitigates the hallucinations and improves the performance. It is shown that LLMSA achieves comparable and even superior performance to existing techniques in various clients. For instance, it attains 66.27% precision and 78.57% recall in taint vulnerability detection, surpassing an industrial approach in F1 score by 0.20.
653			\|a Performance enhancement
653			\|a Semantics
653			\|a Misalignment
653			\|a Hallucinations
653			\|a Prompt engineering
653			\|a Static code analysis
653			\|a Performance evaluation
653			\|a Large language models
653			\|a Natural language processing
653			\|a Decomposition
700	1		\|a Gao, Yifei
700	1		\|a Zhang, Wuqi
700	1		\|a Liu, Xuwei
700	1		\|a Shi, Qingkai
700	1		\|a Zhang, Xiangyu
773	0		\|t arXiv.org \|g (Dec 18, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3147567323/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2412.14399