Improving FIM Code Completions via Context & Curriculum Based Learning

Đã lưu trong:

Chi tiết về thư mục
Xuất bản năm:	arXiv.org (Dec 21, 2024), p. n/a
Tác giả chính:	Sagtani, Hitesh
Tác giả khác:	Mehrotra, Rishabh, Liu, Beyang
Được phát hành:	Cornell University Library, arXiv.org
Những chủ đề:	Curricula Datasets Static code analysis Learning Real time Context Task complexity Benchmarks Ablation
Truy cập trực tuyến:	Citation/Abstract Full text outside of ProQuest
Các nhãn:	Thêm thẻ Không có thẻ, Là người đầu tiên thẻ bản ghi này!

MARC


LEADER	00000nab a2200000uu 4500
001	3148979798
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 3148979798
045	0		\|b d20241221
100	1		\|a Sagtani, Hitesh
245	1		\|a Improving FIM Code Completions via Context & Curriculum Based Learning
260			\|b Cornell University Library, arXiv.org \|c Dec 21, 2024
513			\|a Working Paper
520	3		\|a Fill-in-the-Middle (FIM) models play a vital role in code completion tasks, leveraging both prefix and suffix context to provide more accurate and contextually relevant suggestions. This paper presents approaches to improve FIM code completion while addressing the challenge of maintaining low latency for real-time coding assistance. We enhance FIM code completion by incorporating context and curriculum examples in the training process. We identify patterns where completion suggestions fail more frequently, revealing complexities that smaller language models struggle with. To address these challenges, we develop a curriculum dataset by extracting hard-to-complete patterns from code repositories and generate context examples using semantic and static analysis tools (e.g. TSC compiler). We fine-tune various sized models, including StarCoder and DeepSeek, on this enhanced dataset. Our evaluation encompasses three key dimensions: the Santa Coder FIM task, the Amazon CCEval benchmark, and a new Multi-Line Infilling evaluation benchmark derived from SWE-bench. Comprehensive ablation studies across multiple model sizes reveal that while all fine-tuned models show improvements, the performance gains are more pronounced for smaller parameter models and incorporating difficult-to-complete examples, as part of curriculum learning, improves the code completion performance. This finding is particularly significant given the latency constraints of code completion tasks. While larger models like GPT and Claude perform well in multi-line completions but are prohibitively challenging to use given high latency, and our fine-tuned models achieve a balance between performance and latency. Finally, we validate our approach through online A/B testing, demonstrating tangible improvements in Completion Acceptance Rate (CAR) and Completion Persistence Rate (CPR), with zero latency impact.
653			\|a Curricula
653			\|a Datasets
653			\|a Static code analysis
653			\|a Learning
653			\|a Real time
653			\|a Context
653			\|a Task complexity
653			\|a Benchmarks
653			\|a Ablation
700	1		\|a Mehrotra, Rishabh
700	1		\|a Liu, Beyang
773	0		\|t arXiv.org \|g (Dec 21, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3148979798/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2412.16589