Neural Models for Source Code Synthesis and Completion

Na minha lista:

Detalhes bibliográficos
Publicado no:	arXiv.org (Feb 8, 2024), p. n/a
Autor principal:	Niyogi, Mitodru
Publicado em:	Cornell University Library, arXiv.org
Assuntos:	Translating Data augmentation Performance enhancement Programming environments Semantics Source code Syntax Natural language processing Programming languages Translations
Acesso em linha:	Citation/Abstract Full text outside of ProQuest
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

MARC


LEADER	00000nab a2200000uu 4500
001	2925759191
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 2925759191
045	0		\|b d20240208
100	1		\|a Niyogi, Mitodru
245	1		\|a Neural Models for Source Code Synthesis and Completion
260			\|b Cornell University Library, arXiv.org \|c Feb 8, 2024
513			\|a Working Paper
520	3		\|a Natural language (NL) to code suggestion systems assist developers in Integrated Development Environments (IDEs) by translating NL utterances into compilable code snippet. The current approaches mainly involve hard-coded, rule-based systems based on semantic parsing. These systems make heavy use of hand-crafted rules that map patterns in NL or elements in its syntax parse tree to various query constructs and can only work on a limited subset of NL with a restricted NL syntax. These systems are unable to extract semantic information from the coding intents of the developer, and often fail to infer types, names, and the context of the source code to get accurate system-level code suggestions. In this master thesis, we present sequence-to-sequence deep learning models and training paradigms to map NL to general-purpose programming languages that can assist users with suggestions of source code snippets, given a NL intent, and also extend auto-completion functionality of the source code to users while they are writing source code. The developed architecture incorporates contextual awareness into neural models which generate source code tokens directly instead of generating parse trees/abstract meaning representations from the source code and converting them back to source code. The proposed pretraining strategy and the data augmentation techniques improve the performance of the proposed architecture. The proposed architecture has been found to exceed the performance of a neural semantic parser, TranX, based on the BLEU-4 metric by 10.82%. Thereafter, a finer analysis for the parsable code translations from the NL intent for CoNaLA challenge was introduced. The proposed system is bidirectional as it can be also used to generate NL code documentation given source code. Lastly, a RoBERTa masked language model for Python was proposed to extend the developed system for code completion.
653			\|a Translating
653			\|a Data augmentation
653			\|a Performance enhancement
653			\|a Programming environments
653			\|a Semantics
653			\|a Source code
653			\|a Syntax
653			\|a Natural language processing
653			\|a Programming languages
653			\|a Translations
773	0		\|t arXiv.org \|g (Feb 8, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/2925759191/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2402.06690