Program Synthesis From Natural Language Using Language Models

Salvato in:

Dettagli Bibliografici
Pubblicato in:	ProQuest Dissertations and Theses (2024)
Autore principale:	Ni, Ansong
Pubblicazione:	ProQuest Dissertations & Theses
Soggetti:	Computer science Artificial intelligence Information science
Accesso online:	Citation/Abstract Full Text - PDF
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC


LEADER	00000nab a2200000uu 4500
001	3164073952
003	UK-CbPIL
020			\|a 9798302887467
035			\|a 3164073952
045	2		\|b d20240101 \|b d20241231
084			\|a 66569 \|2 nlm
100	1		\|a Ni, Ansong
245	1		\|a Program Synthesis From Natural Language Using Language Models
260			\|b ProQuest Dissertations & Theses \|c 2024
513			\|a Dissertation/Thesis
520	3		\|a Programming is a ubiquitous problem-solving tool and a generic method of instructing machines. However, mastering programming skills can take months or even years of practice. Thus, making programming more accessible has been a key problem in computer science since its inception. More specifically, generating programs directly from human inputs (e.g., natural language instructions) has been a longstanding challenge in artificial intelligence, due to the ambiguity of natural language and the precision required in programming. Recent advances in large language models (LLMs) have shown great potential in this area. These LLMs, typically neural networks with billions of parameters, are trained with trillions of tokens of text and code, equipping them with the ability to understand natural language and generate code. With their capability for program synthesis from natural language, such LLMs can empower real-world applications such as virtual personal assistants, AI pair-programming, robotics control, and provide natural language interfaces for data queries and visualization. While LLMs have greatly pushed the frontier of program synthesis from natural language inputs and achieved state-of-the-art performance on various code generation benchmarks, their coding capabilities still lag significantly behind those of human programmers. To further improve the capabilities of LLMs in synthesizing programs from natural language inputs, several challenges need to be addressed: 1) LLMs are data- and compute-hungry, making it expensive to train or finetune such models;2) Although LLMs are adept at generating plausible code, they typically lack the ability to model and reason about program execution;3) There is a lack of comprehensive evaluation of the language-to-code generation abilities of LLMs. Addressing the difficulties in obtaining large amounts of training data, we first present a self-training framework for program synthesis using LLMs, where we use the model to sample additional programs to augment the ground truth program for learning. This involves an iterative process of program sampling, execution-based filtering, and learning from the correct ones stored in the cache. Moreover, we propose partial correctness and demonstrate that we can further improve the learning efficiency and model performance by learning not only from fully-correct self-sampled programs but also from partially-correct ones. To equip the LLMs with the ability to reason about program execution, an essential skill for complex coding tasks such as debugging, we train the LLMs to understand execution traces and generate natural language chain-of-thought reasoning rationales. We first design an LLM-friendly trace representation and then reuse the self-training framework, prompting the model to first generate natural language rationales and then the code outputs. While we only filter by the correctness of the code outputs, this typically also filters out low-quality rationales. With iterative training on high-quality rationales and the correct code outputs, we show that LLMs can improve their ability to reason about program execution. Such rationales mimic the reasoning process of human programmers and can also be used to directly communicate with users to improve the interpretability of LLMs for coding tasks such as program repair. To address the issue of the huge computational cost of finetuning large models to understand program execution, we demonstrate that it is possible to train a smaller model as a verifier to assess the program candidates sampled from the LLMs for each natural language input. These verifiers learn to verify the correctness of the generated programs along with their execution results, and rerank the program samples based on the generation probability from the LLMs and the verification probability from the smaller verifier models. Lastly, to obtain a comprehensive understanding of the language-to-code generation capabilities of LLMs, this dissertation also presents an evaluation on 7 benchmarks from three key domains: semantic parsing, math reasoning, and Python programming. We evaluate a total of 56 models of various sizes, training data mixtures, and training methods, with the aim to study the effect of these factors on model performance across different tasks. Targeting the data contamination issue of code generation benchmarks, we also present a pipeline to identify such contaminated examples with both surface- and semantic-level searches, and quantify the level of contamination for two commonly used datasets.
653			\|a Computer science
653			\|a Artificial intelligence
653			\|a Information science
773	0		\|t ProQuest Dissertations and Theses \|g (2024)
786	0		\|d ProQuest \|t ProQuest Dissertations & Theses Global
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3164073952/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3164073952/fulltextPDF/embedded/H09TXR3UUZB2ISDL?source=fedsrch