CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

I tiakina i:

Ngā taipitopito rārangi puna kōrero
I whakaputaina i:	arXiv.org (Dec 19, 2024), p. n/a
Kaituhi matua:	Leitian Tao
Ētahi atu kaituhi:	Chen, Xiang, Yu, Tong, Tung Mai, Rossi, Ryan, Li, Yixuan, Mitra, Saayan
I whakaputaina:	Cornell University Library, arXiv.org
Ngā marau:	Data analysis Source code Codes Bridge failure Large language models
Urunga tuihono:	Citation/Abstract Full text outside of ProQuest
Ngā Tūtohu:	Tāpirihia he Tūtohu Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!

Whakaahuatanga
Whakarāpopotonga:	Large Language Models (LLMs) have revolutionized code generation but require significant resources and often over-generalize, limiting their task-specific efficiency. Fine-tuning smaller, open-source LLMs provides a cost-effective alternative. However, standard supervised approaches rely only on correct examples, missing valuable insights from failures. We introduce CodeLutra, a framework that leverages both correct and incorrect code attempts. Instead of using only correct solutions, CodeLutra applies iterative preference-based refinement, comparing successful and failed outputs to better approximate desired results. This approach narrows the performance gap with state-of-the-art larger models without requiring massive datasets or auxiliary models. For instance, on a challenging data science coding task, using only 500 samples improved Llama-3-8B's accuracy from 28.2% to 48.6%, approaching GPT-4's level. By learning from both successes and mistakes, CodeLutra provides a scalable and efficient path to high-quality code generation, making smaller open-source models more competitive with leading closed-source alternatives.
ISSN:	2331-8422
Puna:	Engineering Database