Improving Cloud Data Processing and Storage

שמור ב:
מידע ביבליוגרפי
הוצא לאור ב:ProQuest Dissertations and Theses (2025)
מחבר ראשי: Wang, Ziheng
יצא לאור:
ProQuest Dissertations & Theses
נושאים:
גישה מקוונת:Citation/Abstract
Full Text - PDF
תגים: הוספת תג
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

MARC

LEADER 00000nab a2200000uu 4500
001 3275492476
003 UK-CbPIL
020 |a 9798265429520 
035 |a 3275492476 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Wang, Ziheng 
245 1 |a Improving Cloud Data Processing and Storage 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a SQL is not merely a query language -- it is a state of mind. To think in SQL is to view reality through the lens of sets and predicates. A crowded room becomes a table of persons, each with attributes that can be filtered, grouped, and aggregated. Conversations become transactions, friendships become foreign keys, and communities emerge from inner and outer joins. We normalize our thoughts, decomposing complex ideas into atoms that can be recomposed through relational algebra. We seek primary keys in every domain -- those unique identifiers that anchor understanding. We think in terms of constraints and integrity, recognizing that truth emerges not from individual records but from the relationships between them.Each computing epoch has demanded its own translation of this relational philosophy into silicon and wire. From mainframes executing batch jobs to client-server architectures, each generation has reimagined how to manifest set-theoretic operations in the medium of their time. Today, cloud computing presents us with new primitives: ephemeral compute, disaggregated storage, and elastic scale. Our challenge is not to abandon or even evolve the relational creed, but to discover how its eternal truths can flourish when tables grow to petabytes, when compute materializes on demand, and when the "database server" dissolves into a constellation of different hosted services.This dissertation explores how to realize the relational vision in the cloud era. We begin by improving distributed query processing through two key innovations: balancing fault recovery with pipelined execution in streaming dataflow systems, and reasoning about query execution on heterogeneous compute resources. We then turn to the storage layer, showing how to optimize cloud-native data lakes for selective queries by building consistent, bolt-on indices over object storage. We demonstrate these principles through a concrete implementation for log search, showcasing how relational operations can efficiently navigate massive volumes of semi-structured data.We hope the reader will come to appreciate how the synthesis of distributed systems theory and cloud engineering practice allows the relational model to flourish beyond its traditional confines without sacrificing its essential beauty. 
653 |a Data processing 
653 |a Workers 
653 |a Fault tolerance 
653 |a Pareto optimum 
653 |a Computer science 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3275492476/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3275492476/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch