We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs

Salvato in:

Dettagli Bibliografici
Pubblicato in:	arXiv.org (Sep 24, 2024), p. n/a
Autore principale:	Spracklen, Joseph
Altri autori:	Wijewickrama, Raveen, A H M Nazmus Sakib, Maiti, Anindya, Viswanath, Bimal, Murtuza Jadliwala
Pubblicazione:	Cornell University Library, arXiv.org
Soggetti:	Computer program integrity Supply chains Programming languages Python Large language models Open source software Software
Accesso online:	Citation/Abstract Full text outside of ProQuest
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC


LEADER	00000nab a2200000uu 4500
001	3069344054
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 3069344054
045	0		\|b d20240924
100	1		\|a Spracklen, Joseph
245	1		\|a We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
260			\|b Cornell University Library, arXiv.org \|c Sep 24, 2024
513			\|a Working Paper
520	3		\|a The reliance of popular programming languages such as Python and JavaScript on centralized package repositories and open-source software, combined with the emergence of code-generating Large Language Models (LLMs), has created a new type of threat to the software supply chain: package hallucinations. These hallucinations, which arise from fact-conflicting errors when generating code using LLMs, represent a novel form of package confusion attack that poses a critical threat to the integrity of the software supply chain. This paper conducts a rigorous and comprehensive evaluation of package hallucinations across different programming languages, settings, and parameters, exploring how a diverse set of models and configurations affect the likelihood of generating erroneous package recommendations and identifying the root causes of this phenomenon. Using 16 popular LLMs for code generation and two unique prompt datasets, we generate 576,000 code samples in two programming languages that we analyze for package hallucinations. Our findings reveal that that the average percentage of hallucinated packages is at least 5.2% for commercial models and 21.7% for open-source models, including a staggering 205,474 unique examples of hallucinated package names, further underscoring the severity and pervasiveness of this threat. To overcome this problem, we implement several hallucination mitigation strategies and show that they are able to significantly reduce the number of package hallucinations while maintaining code quality. Our experiments and findings highlight package hallucinations as a persistent and systemic phenomenon while using state-of-the-art LLMs for code generation, and a significant challenge which deserves the research community's urgent attention.
653			\|a Computer program integrity
653			\|a Supply chains
653			\|a Programming languages
653			\|a Python
653			\|a Large language models
653			\|a Open source software
653			\|a Software
700	1		\|a Wijewickrama, Raveen
700	1		\|a A H M Nazmus Sakib
700	1		\|a Maiti, Anindya
700	1		\|a Viswanath, Bimal
700	1		\|a Murtuza Jadliwala
773	0		\|t arXiv.org \|g (Sep 24, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3069344054/abstract/embedded/BH75TPHOCCPB476R?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2406.10279