Advancing Code Intelligence With Language Models

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2026)
Autor principal: Ding, Yangruibo
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3273669335
003 UK-CbPIL
020 |a 9798265409478 
035 |a 3273669335 
045 2 |b d20260101  |b d20261231 
084 |a 66569  |2 nlm 
100 1 |a Ding, Yangruibo 
245 1 |a Advancing Code Intelligence With Language Models 
260 |b ProQuest Dissertations & Theses  |c 2026 
513 |a Dissertation/Thesis 
520 3 |a The software development landscape is experiencing a revolutionary transformation with Large Language Models (LLMs). While LLMs show significant potential in code generation, especially in standalone function completion (e.g., LeetCode problems), their current capabilities fall short of real-world, more complex software engineering (SE) tasks, such as systematic debugging and large-scale programming and patching. The fundamental challenges for LLMs to reliably assist with realistic SE tasks lie in (1) reasoning about code-specific semantics, such as runtime behaviors of programs, and (2) handling complex and dynamic interactions, such as managing inter-file references or contexts and supporting the iterative nature of programming. This dissertation centers around enabling LLMs to meet the real-world demands of software development, driven by three key insights. First, code semantics are rich, complex, and multi-modal (e.g., static code structure and dynamic execution are different modalities), and as a programming expert, LLMs should not only be able to generate code snippets but also deeply understand what they are doing -- those key properties, constraints, and runtime behaviors of programs. Second, real-world software development is inherently context-rich and interactive, involving complicated dependencies and iterative refinement. LLMs must go beyond static, single-file comprehension and exhibit the ability to navigate cross-file contexts, manage dependencies, and synthesize feedback from execution. Third, evaluation approaches and metrics of LLMs for code should reflect the realistic complexity of software development, quantifying model capabilities in critical applications under realistic scenarios. Therefore, we implement approaches to train LLMs to reason about comprehensive code semantics, enhancing their adaptability to context-rich and iterative development workflows, and constructing realistic evaluation frameworks for LLM-powered programming assistants. We have developed pre-/post-training strategies to represent and reason about program properties, dynamic executions, and multi-modal code semantics, enabling improved performance in crucial SE applications, including code generation, clone retrieval, debugging, and program repair. We also designed novel approaches to retrieve and incorporate project-level code context for code completion and empowered LLMs with iterative self-refinement by analyzing the execution feedback. To rigorously evaluate LLMs, we constructed frameworks that uncover LLMs' weaknesses for practical utility, including dependency management, self-consistency, and vulnerability detection. Besides addressing software engineering challenges, my work contributes to fundamental LLM research in code reasoning and understanding, and robust evaluations, pushing the boundaries of LLM-driven program analysis and synthesis. 
653 |a Computer science 
653 |a Computer engineering 
653 |a Artificial intelligence 
773 0 |t ProQuest Dissertations and Theses  |g (2026) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3273669335/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3273669335/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch