Neural Models for Source Code Synthesis and Completion

Na minha lista:
Detalhes bibliográficos
Publicado no:arXiv.org (Feb 8, 2024), p. n/a
Autor principal: Niyogi, Mitodru
Publicado em:
Cornell University Library, arXiv.org
Assuntos:
Acesso em linha:Citation/Abstract
Full text outside of ProQuest
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!

MARC

LEADER 00000nab a2200000uu 4500
001 2925759191
003 UK-CbPIL
022 |a 2331-8422 
035 |a 2925759191 
045 0 |b d20240208 
100 1 |a Niyogi, Mitodru 
245 1 |a Neural Models for Source Code Synthesis and Completion 
260 |b Cornell University Library, arXiv.org  |c Feb 8, 2024 
513 |a Working Paper 
520 3 |a Natural language (NL) to code suggestion systems assist developers in Integrated Development Environments (IDEs) by translating NL utterances into compilable code snippet. The current approaches mainly involve hard-coded, rule-based systems based on semantic parsing. These systems make heavy use of hand-crafted rules that map patterns in NL or elements in its syntax parse tree to various query constructs and can only work on a limited subset of NL with a restricted NL syntax. These systems are unable to extract semantic information from the coding intents of the developer, and often fail to infer types, names, and the context of the source code to get accurate system-level code suggestions. In this master thesis, we present sequence-to-sequence deep learning models and training paradigms to map NL to general-purpose programming languages that can assist users with suggestions of source code snippets, given a NL intent, and also extend auto-completion functionality of the source code to users while they are writing source code. The developed architecture incorporates contextual awareness into neural models which generate source code tokens directly instead of generating parse trees/abstract meaning representations from the source code and converting them back to source code. The proposed pretraining strategy and the data augmentation techniques improve the performance of the proposed architecture. The proposed architecture has been found to exceed the performance of a neural semantic parser, TranX, based on the BLEU-4 metric by 10.82%. Thereafter, a finer analysis for the parsable code translations from the NL intent for CoNaLA challenge was introduced. The proposed system is bidirectional as it can be also used to generate NL code documentation given source code. Lastly, a RoBERTa masked language model for Python was proposed to extend the developed system for code completion. 
653 |a Translating 
653 |a Data augmentation 
653 |a Performance enhancement 
653 |a Programming environments 
653 |a Semantics 
653 |a Source code 
653 |a Syntax 
653 |a Natural language processing 
653 |a Programming languages 
653 |a Translations 
773 0 |t arXiv.org  |g (Feb 8, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2925759191/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2402.06690