CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds

Guardado en:
書目詳細資料
發表在:arXiv.org (Dec 7, 2024), p. n/a
主要作者: Wang, Lei
其他作者: Lian, Jianxun, Huang, Yi, Dai, Yanqi, Li, Haoxuan, Chen, Xu, Xie, Xing, Ji-Rong, Wen
出版:
Cornell University Library, arXiv.org
主題:
在線閱讀:Citation/Abstract
Full text outside of ProQuest
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!

MARC

LEADER 00000nab a2200000uu 4500
001 3142733846
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3142733846 
045 0 |b d20241207 
100 1 |a Wang, Lei 
245 1 |a CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds 
260 |b Cornell University Library, arXiv.org  |c Dec 7, 2024 
513 |a Working Paper 
520 3 |a Role-playing is a crucial capability of Large Language Models (LLMs), enabling a wide range of practical applications, including intelligent non-player characters, digital twins, and emotional companions. Evaluating this capability in LLMs is challenging due to the complex dynamics involved in role-playing, such as maintaining character fidelity throughout a storyline and navigating open-ended narratives without a definitive ground truth. Current evaluation methods, which primarily focus on question-answering or conversational snapshots, fall short of adequately capturing the nuanced character traits and behaviors essential for authentic role-playing. In this paper, we propose CharacterBox, which is a simulation sandbox designed to generate situational fine-grained character behavior trajectories. These behavior trajectories enable a more comprehensive and in-depth evaluation of role-playing capabilities. CharacterBox consists of two main components: the character agent and the narrator agent. The character agent, grounded in psychological and behavioral science, exhibits human-like behaviors, while the narrator agent coordinates interactions between character agents and environmental changes. Additionally, we introduce two trajectory-based methods that leverage CharacterBox to enhance LLM performance. To reduce costs and facilitate the adoption of CharacterBox by public communities, we fine-tune two smaller models, CharacterNR and CharacterRM, as substitutes for GPT API calls, and demonstrate their competitive performance compared to advanced GPT APIs. 
653 |a Application programming interface 
653 |a Role playing 
653 |a Behavior 
653 |a Large language models 
653 |a Digital twins 
700 1 |a Lian, Jianxun 
700 1 |a Huang, Yi 
700 1 |a Dai, Yanqi 
700 1 |a Li, Haoxuan 
700 1 |a Chen, Xu 
700 1 |a Xie, Xing 
700 1 |a Ji-Rong, Wen 
773 0 |t arXiv.org  |g (Dec 7, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3142733846/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.05631