Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents

Guardat en:

Dades bibliogràfiques
Publicat a:	arXiv.org (Dec 24, 2024), p. n/a
Autor principal:	Ning, Kaiwen
Altres autors:	Chen, Jiachi, Zhang, Jingwen, Wei, Lia, Wang, Zexu, Feng, Yuming, Zhang, Weizhe, Zheng, Zibin
Publicat:	Cornell University Library, arXiv.org
Matèries:	Descriptions Static code analysis Large language models Defects Workflow Natural language
Accés en línia:	Citation/Abstract Full text outside of ProQuest
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC


LEADER	00000nab a2200000uu 4500
001	3149108561
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 3149108561
045	0		\|b d20241224
100	1		\|a Ning, Kaiwen
245	1		\|a Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents
260			\|b Cornell University Library, arXiv.org \|c Dec 24, 2024
513			\|a Working Paper
520	3		\|a AI agents are systems capable of perceiving their environment, autonomously planning and executing tasks. Recent advancements in LLM have introduced a transformative paradigm for AI agents, enabling them to interact with external resources and tools through prompts. In such agents, the workflow integrates developer-written code, which manages framework construction and logic control, with LLM-generated natural language that enhances dynamic decision-making and interaction. However, discrepancies between developer-implemented logic and the dynamically generated content of LLMs in terms of behavior and expected outcomes can lead to defects, such as tool invocation failures and task execution errors. These issues introduce specific risks, leading to various defects in LLM-based AI Agents, such as service interruptions. Despite the importance of these issues, there is a lack of systematic work that focuses on analyzing LLM-based AI Agents to uncover defects in their code. In this paper, we present the first study focused on identifying and detecting defects in LLM Agents. We collected and analyzed 6,854 relevant posts from StackOverflow to define 8 types of agent defects. For each type, we provided detailed descriptions with an example. Then, we designed a static analysis tool, named Agentable, to detect the defects. Agentable leverages Code Property Graphs and LLMs to analyze Agent workflows by efficiently identifying specific code patterns and analyzing natural language descriptions. To evaluate Agentable, we constructed two datasets: AgentSet, consists of 84 real-world Agents, and AgentTest, which contains 78 Agents specifically designed to include various types of defects. Our results show that Agentable achieved an overall accuracy of 88.79% and a recall rate of 91.03%. Furthermore, our analysis reveals the 889 defects of the AgentSet, highlighting the prevalence of these defects.
653			\|a Descriptions
653			\|a Static code analysis
653			\|a Large language models
653			\|a Defects
653			\|a Workflow
653			\|a Natural language
700	1		\|a Chen, Jiachi
700	1		\|a Zhang, Jingwen
700	1		\|a Wei, Lia
700	1		\|a Wang, Zexu
700	1		\|a Feng, Yuming
700	1		\|a Zhang, Weizhe
700	1		\|a Zheng, Zibin
773	0		\|t arXiv.org \|g (Dec 24, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3149108561/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2412.18371