Enhancing and Hardening Neural Code Model
Furkejuvvon:
| Publikašuvnnas: | PQDT - Global (2025) |
|---|---|
| Váldodahkki: | |
| Almmustuhtton: |
ProQuest Dissertations & Theses
|
| Fáttát: | |
| Liŋkkat: | Citation/Abstract Full Text - PDF Full text outside of ProQuest |
| Fáddágilkorat: |
Eai fáddágilkorat, Lasit vuosttaš fáddágilkora!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3273630411 | ||
| 003 | UK-CbPIL | ||
| 020 | |a 9798263313142 | ||
| 035 | |a 3273630411 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 189128 |2 nlm | ||
| 100 | 1 | |a Li, Zongjie | |
| 245 | 1 | |a Enhancing and Hardening Neural Code Model | |
| 260 | |b ProQuest Dissertations & Theses |c 2025 | ||
| 513 | |a Dissertation/Thesis | ||
| 520 | 3 | |a With the rapid advancement of deep learning technologies, neural code models have achieved remarkable success, facilitating significant breakthroughs across various code-related applications. Leveraging powerful computational resources and massive training data, these models demonstrate sophisticated capabilities in understanding, analyzing, and generating diverse programming code. Unlike models primarily designed for natural language tasks, code models are typically engineered for integration into various productivity scenarios and practical development workflows. Consequently, developing neural code models with high accuracy, reliability, and freedom from potential intellectual property risks has become imperative.This thesis proposal focuses on designing and developing neural code models through three key aspects: 1) enhancing model performance through data augmentation and architectural improvements, 2) refining output consistency through code structure and semantic analysis, 3) incorporating verifiable watermarks to protect intellectual property, and 4) synthesizing the domain-specific dataset for code models. In our first contribution, we present a framework that leverages compiler-generated Intermediate Representation (IR) code for data augmentation, enabling improved embeddings that support various downstream code applications. To further enhance code generation capabilities, our second work introduces CCTEST, a system that inserts context-free code snippets to detect and rectify inconsistencies. In our third work, we exploit programming language semantics and token distribution characteristics to embed verifiable watermarks in model outputs, thereby enhancing model security and intellectual property protection. In our fourth work, we propose a novel approach to synthesizing domain-specific datasets for fine-tuning the code models, addressing the challenges of data scarcity and quality in specialized domains. | |
| 653 | |a Language | ||
| 653 | |a Readability | ||
| 653 | |a Software | ||
| 653 | |a Large language models | ||
| 653 | |a Natural language | ||
| 653 | |a Computer engineering | ||
| 773 | 0 | |t PQDT - Global |g (2025) | |
| 786 | 0 | |d ProQuest |t ProQuest Dissertations & Theses Global | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3273630411/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3273630411/fulltextPDF/embedded/H09TXR3UUZB2ISDL?source=fedsrch |
| 856 | 4 | 0 | |3 Full text outside of ProQuest |u https://doi.org/10.14711/thesis-hdl152440 |