Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights

Gardado en:

Detalles Bibliográficos
Publicado en:	arXiv.org (Oct 16, 2024), p. n/a
Autor Principal:	Krishna, Rahul
Outros autores:	Pan, Rangeet, Pavuluri, Raju, Tamilselvam, Srikanth, Vukovic, Maja, Sinha, Saurabh
Publicado:	Cornell University Library, arXiv.org
Materias:	Learning curves Python Program verification (computers) Source code Static code analysis Large language models Libraries Open source software Programming languages Effectiveness
Acceso en liña:	Citation/Abstract Full text outside of ProQuest
Etiquetas:	Engadir etiqueta Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!

MARC


LEADER	00000nab a2200000uu 4500
001	3118116875
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 3118116875
045	0		\|b d20241016
100	1		\|a Krishna, Rahul
245	1		\|a Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights
260			\|b Cornell University Library, arXiv.org \|c Oct 16, 2024
513			\|a Working Paper
520	3		\|a Large Language Models for Code (or code LLMs) are increasingly gaining popularity and capabilities, offering a wide array of functionalities such as code completion, code generation, code summarization, test generation, code translation, and more. To leverage code LLMs to their full potential, developers must provide code-specific contextual information to the models. These are typically derived and distilled using program analysis tools. However, there exists a significant gap--these static analysis tools are often language-specific and come with a steep learning curve, making their effective use challenging. These tools are tailored to specific program languages, requiring developers to learn and manage multiple tools to cover various aspects of the their code base. Moreover, the complexity of configuring and integrating these tools into the existing development environments add an additional layer of difficulty. This challenge limits the potential benefits that could be gained from more widespread and effective use of static analysis in conjunction with LLMs. To address this challenge, we present codellm-devkit (hereafter, `CLDK'), an open-source library that significantly simplifies the process of performing program analysis at various levels of granularity for different programming languages to support code LLM use cases. As a Python library, CLDK offers developers an intuitive and user-friendly interface, making it incredibly easy to provide rich program analysis context to code LLMs. With this library, developers can effortlessly integrate detailed, code-specific insights that enhance the operational efficiency and effectiveness of LLMs in coding tasks. CLDK is available as an open-source library at https://github.com/IBM/codellm-devkit.
653			\|a Learning curves
653			\|a Python
653			\|a Program verification (computers)
653			\|a Source code
653			\|a Static code analysis
653			\|a Large language models
653			\|a Libraries
653			\|a Open source software
653			\|a Programming languages
653			\|a Effectiveness
700	1		\|a Pan, Rangeet
700	1		\|a Pavuluri, Raju
700	1		\|a Tamilselvam, Srikanth
700	1		\|a Vukovic, Maja
700	1		\|a Sinha, Saurabh
773	0		\|t arXiv.org \|g (Oct 16, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3118116875/abstract/embedded/BH75TPHOCCPB476R?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2410.13007