ExecRepoBench: Multi-level Executable Code Completion Evaluation
محفوظ في:
| الحاوية / القاعدة: | arXiv.org (Dec 16, 2024), p. n/a |
|---|---|
| المؤلف الرئيسي: | |
| مؤلفون آخرون: | , , , , , , , , , , |
| منشور في: |
Cornell University Library, arXiv.org
|
| الموضوعات: | |
| الوصول للمادة أونلاين: | Citation/Abstract Full text outside of ProQuest |
| الوسوم: |
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3145904181 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2331-8422 | ||
| 035 | |a 3145904181 | ||
| 045 | 0 | |b d20241216 | |
| 100 | 1 | |a Yang, Jian | |
| 245 | 1 | |a ExecRepoBench: Multi-level Executable Code Completion Evaluation | |
| 260 | |b Cornell University Library, arXiv.org |c Dec 16, 2024 | ||
| 513 | |a Working Paper | ||
| 520 | 3 | |a Code completion has become an essential tool for daily software development. Existing evaluation benchmarks often employ static methods that do not fully capture the dynamic nature of real-world coding environments and face significant challenges, including limited context length, reliance on superficial evaluation metrics, and potential overfitting to training datasets. In this work, we introduce a novel framework for enhancing code completion in software development through the creation of a repository-level benchmark ExecRepoBench and the instruction corpora Repo-Instruct, aim at improving the functionality of open-source large language models (LLMs) in real-world coding scenarios that involve complex interdependencies across multiple files. ExecRepoBench includes 1.2K samples from active Python repositories. Plus, we present a multi-level grammar-based completion methodology conditioned on the abstract syntax tree to mask code fragments at various logical units (e.g. statements, expressions, and functions). Then, we fine-tune the open-source LLM with 7B parameters on Repo-Instruct to produce a strong code completion baseline model Qwen2.5-Coder-Instruct-C based on the open-source model. Qwen2.5-Coder-Instruct-C is rigorously evaluated against existing benchmarks, including MultiPL-E and ExecRepoBench, which consistently outperforms prior baselines across all programming languages. The deployment of \ourmethod{} can be used as a high-performance, local service for programming development\footnote{\url{https://execrepobench.github.io/}}. | |
| 653 | |a Repositories | ||
| 653 | |a Python | ||
| 653 | |a Source code | ||
| 653 | |a Software development | ||
| 653 | |a Large language models | ||
| 653 | |a Coders | ||
| 653 | |a Open source software | ||
| 653 | |a Programming languages | ||
| 653 | |a Benchmarks | ||
| 653 | |a Coding | ||
| 700 | 1 | |a Zhang, Jiajun | |
| 700 | 1 | |a Yang, Jiaxi | |
| 700 | 1 | |a Jin, Ke | |
| 700 | 1 | |a Zhang, Lei | |
| 700 | 1 | |a Peng, Qiyao | |
| 700 | 1 | |a Deng, Ken | |
| 700 | 1 | |a Miao, Yibo | |
| 700 | 1 | |a Liu, Tianyu | |
| 700 | 1 | |a Cui, Zeyu | |
| 700 | 1 | |a Binyuan Hui | |
| 700 | 1 | |a Lin, Junyang | |
| 773 | 0 | |t arXiv.org |g (Dec 16, 2024), p. n/a | |
| 786 | 0 | |d ProQuest |t Engineering Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3145904181/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch |
| 856 | 4 | 0 | |3 Full text outside of ProQuest |u http://arxiv.org/abs/2412.11990 |