Hardening the Fuzzing Ecosystem Through Automated Seed Generation, Report Deduplication, and Patch Correctness Validation

Guardat en:
Dades bibliogràfiques
Publicat a:ProQuest Dissertations and Theses (2025)
Autor principal: Wu, Yuhang
Publicat:
ProQuest Dissertations & Theses
Matèries:
Accés en línia:Citation/Abstract
Full Text - PDF
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3281642417
003 UK-CbPIL
020 |a 9798265483034 
035 |a 3281642417 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Wu, Yuhang 
245 1 |a Hardening the Fuzzing Ecosystem Through Automated Seed Generation, Report Deduplication, and Patch Correctness Validation 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Fuzz testing has long served as a foundational methodology for discovering software vulnerabilities, enabling dynamic exploration of execution paths through structured yet randomized input mutations. Fuzz testing engines such as American Fuzzy Lop (AFL), LibFuzzer, and syzkaller have driven significant advances in automated vulnerability discovery and program state coverage. Despite these successes, recent evidence suggests that improving fuzzer mutation strategies, scheduling heuristics, or feedback metrics is no longer sufficient to strengthen the broader fuzzing centered software security ecosystem. A significant portion of the practical bottlenecks arises not within the act of fuzzing, but in the stages that precede and follow it.This dissertation argues that three key components, seed preparation, bug report triage, and patch validation, constitute the true limiting factors for achieving scalable, trustworthy fuzz driven security analysis. These stages collectively determine whether increased fuzzing throughput can translate into effective vulnerability discovery and timely remediation.First, the quality of input seeds plays a pivotal role in guiding fuzzers toward deep, semantically meaningful program states. Manually crafting such seeds is labor intensive and fundamentally non-scalable for complex software targets. I analyze the shortcomings of existing seed corpora and develop automated techniques to generate structurally valid, format-conscious, and semantically rich seeds tailored to program functionality. These techniques substantially improve code coverage and bug exposure while eliminating manual engineering overhead.Second, the explosion in vulnerability reports generated by modern fuzzers has created severe triage challenges. For example, syzkaller alone uncovered 3,736 Linux kernel bugs in only three years, yet nearly half of these reports were duplicates, imposing significant delays and resource burdens on maintainers. My work proposes more accurate, scalable mechanisms for deduplicating fuzz-generated bug reports, mitigating long backlogs, and enabling maintainers to focus on genuinely distinct and high impact issues.Finally, the surge in discovered bugs naturally leads to a corresponding increase in patches. However, patch quality has not kept pace with the volume of patches. Approximately 6% of the Linux kernel patches are incorrect, with a subset introducing new security flaws or failing to fully address the underlying vulnerability. To address this, I design and implement Klaus, a system that detects incorrect or incomplete patches by analyzing patch behavior and its impact on system execution. Evaluated on real world kernel patches, Klaus identifies previously unnoticed regressions and security relevant inconsistencies, highlighting its practical value for maintaining the integrity of large, security-critical codebases.Together, these contributions reinforce key operational pillars of the fuzzing lifecycle that lie beyond the design of the fuzzer itself. By strengthening seed generation, streamlining large scale bug triage, and safeguarding patch correctness, this work advances the reliability, scalability, and long term sustainability of fuzz-based software security analysis. 
653 |a Computer science 
653 |a Computer engineering 
653 |a Information technology 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3281642417/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3281642417/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch