ExecRepoBench: Multi-level Executable Code Completion Evaluation

Guardado en:

書目詳細資料
發表在:	arXiv.org (Dec 16, 2024), p. n/a
主要作者:	Yang, Jian
其他作者:	Zhang, Jiajun, Yang, Jiaxi, Jin, Ke, Zhang, Lei, Peng, Qiyao, Deng, Ken, Miao, Yibo, Liu, Tianyu, Cui, Zeyu, Binyuan Hui, Lin, Junyang
出版:	Cornell University Library, arXiv.org
主題:	Repositories Python Source code Software development Large language models Coders Open source software Programming languages Benchmarks Coding
在線閱讀:	Citation/Abstract Full text outside of ProQuest
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

實物特徵
Resumen:	Code completion has become an essential tool for daily software development. Existing evaluation benchmarks often employ static methods that do not fully capture the dynamic nature of real-world coding environments and face significant challenges, including limited context length, reliance on superficial evaluation metrics, and potential overfitting to training datasets. In this work, we introduce a novel framework for enhancing code completion in software development through the creation of a repository-level benchmark ExecRepoBench and the instruction corpora Repo-Instruct, aim at improving the functionality of open-source large language models (LLMs) in real-world coding scenarios that involve complex interdependencies across multiple files. ExecRepoBench includes 1.2K samples from active Python repositories. Plus, we present a multi-level grammar-based completion methodology conditioned on the abstract syntax tree to mask code fragments at various logical units (e.g. statements, expressions, and functions). Then, we fine-tune the open-source LLM with 7B parameters on Repo-Instruct to produce a strong code completion baseline model Qwen2.5-Coder-Instruct-C based on the open-source model. Qwen2.5-Coder-Instruct-C is rigorously evaluated against existing benchmarks, including MultiPL-E and ExecRepoBench, which consistently outperforms prior baselines across all programming languages. The deployment of \ourmethod{} can be used as a high-performance, local service for programming development\footnote{\url{https://execrepobench.github.io/}}.
ISSN:	2331-8422
Fuente:	Engineering Database