ExecRepoBench: Multi-level Executable Code Completion Evaluation
Guardado en:
| 發表在: | arXiv.org (Dec 16, 2024), p. n/a |
|---|---|
| 主要作者: | |
| 其他作者: | , , , , , , , , , , |
| 出版: |
Cornell University Library, arXiv.org
|
| 主題: | |
| 在線閱讀: | Citation/Abstract Full text outside of ProQuest |
| 標簽: |
沒有標簽, 成為第一個標記此記錄!
|
| Resumen: | Code completion has become an essential tool for daily software development. Existing evaluation benchmarks often employ static methods that do not fully capture the dynamic nature of real-world coding environments and face significant challenges, including limited context length, reliance on superficial evaluation metrics, and potential overfitting to training datasets. In this work, we introduce a novel framework for enhancing code completion in software development through the creation of a repository-level benchmark ExecRepoBench and the instruction corpora Repo-Instruct, aim at improving the functionality of open-source large language models (LLMs) in real-world coding scenarios that involve complex interdependencies across multiple files. ExecRepoBench includes 1.2K samples from active Python repositories. Plus, we present a multi-level grammar-based completion methodology conditioned on the abstract syntax tree to mask code fragments at various logical units (e.g. statements, expressions, and functions). Then, we fine-tune the open-source LLM with 7B parameters on Repo-Instruct to produce a strong code completion baseline model Qwen2.5-Coder-Instruct-C based on the open-source model. Qwen2.5-Coder-Instruct-C is rigorously evaluated against existing benchmarks, including MultiPL-E and ExecRepoBench, which consistently outperforms prior baselines across all programming languages. The deployment of \ourmethod{} can be used as a high-performance, local service for programming development\footnote{\url{https://execrepobench.github.io/}}. |
|---|---|
| ISSN: | 2331-8422 |
| Fuente: | Engineering Database |