CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
Guardado en:
| 發表在: | arXiv.org (Dec 7, 2024), p. n/a |
|---|---|
| 主要作者: | |
| 其他作者: | , , , , , , |
| 出版: |
Cornell University Library, arXiv.org
|
| 主題: | |
| 在線閱讀: | Citation/Abstract Full text outside of ProQuest |
| 標簽: |
沒有標簽, 成為第一個標記此記錄!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3142733846 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2331-8422 | ||
| 035 | |a 3142733846 | ||
| 045 | 0 | |b d20241207 | |
| 100 | 1 | |a Wang, Lei | |
| 245 | 1 | |a CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds | |
| 260 | |b Cornell University Library, arXiv.org |c Dec 7, 2024 | ||
| 513 | |a Working Paper | ||
| 520 | 3 | |a Role-playing is a crucial capability of Large Language Models (LLMs), enabling a wide range of practical applications, including intelligent non-player characters, digital twins, and emotional companions. Evaluating this capability in LLMs is challenging due to the complex dynamics involved in role-playing, such as maintaining character fidelity throughout a storyline and navigating open-ended narratives without a definitive ground truth. Current evaluation methods, which primarily focus on question-answering or conversational snapshots, fall short of adequately capturing the nuanced character traits and behaviors essential for authentic role-playing. In this paper, we propose CharacterBox, which is a simulation sandbox designed to generate situational fine-grained character behavior trajectories. These behavior trajectories enable a more comprehensive and in-depth evaluation of role-playing capabilities. CharacterBox consists of two main components: the character agent and the narrator agent. The character agent, grounded in psychological and behavioral science, exhibits human-like behaviors, while the narrator agent coordinates interactions between character agents and environmental changes. Additionally, we introduce two trajectory-based methods that leverage CharacterBox to enhance LLM performance. To reduce costs and facilitate the adoption of CharacterBox by public communities, we fine-tune two smaller models, CharacterNR and CharacterRM, as substitutes for GPT API calls, and demonstrate their competitive performance compared to advanced GPT APIs. | |
| 653 | |a Application programming interface | ||
| 653 | |a Role playing | ||
| 653 | |a Behavior | ||
| 653 | |a Large language models | ||
| 653 | |a Digital twins | ||
| 700 | 1 | |a Lian, Jianxun | |
| 700 | 1 | |a Huang, Yi | |
| 700 | 1 | |a Dai, Yanqi | |
| 700 | 1 | |a Li, Haoxuan | |
| 700 | 1 | |a Chen, Xu | |
| 700 | 1 | |a Xie, Xing | |
| 700 | 1 | |a Ji-Rong, Wen | |
| 773 | 0 | |t arXiv.org |g (Dec 7, 2024), p. n/a | |
| 786 | 0 | |d ProQuest |t Engineering Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3142733846/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch |
| 856 | 4 | 0 | |3 Full text outside of ProQuest |u http://arxiv.org/abs/2412.05631 |