Improving Cloud Data Processing and Storage
שמור ב:
| הוצא לאור ב: | ProQuest Dissertations and Theses (2025) |
|---|---|
| מחבר ראשי: | |
| יצא לאור: |
ProQuest Dissertations & Theses
|
| נושאים: | |
| גישה מקוונת: | Citation/Abstract Full Text - PDF |
| תגים: |
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3275492476 | ||
| 003 | UK-CbPIL | ||
| 020 | |a 9798265429520 | ||
| 035 | |a 3275492476 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 66569 |2 nlm | ||
| 100 | 1 | |a Wang, Ziheng | |
| 245 | 1 | |a Improving Cloud Data Processing and Storage | |
| 260 | |b ProQuest Dissertations & Theses |c 2025 | ||
| 513 | |a Dissertation/Thesis | ||
| 520 | 3 | |a SQL is not merely a query language -- it is a state of mind. To think in SQL is to view reality through the lens of sets and predicates. A crowded room becomes a table of persons, each with attributes that can be filtered, grouped, and aggregated. Conversations become transactions, friendships become foreign keys, and communities emerge from inner and outer joins. We normalize our thoughts, decomposing complex ideas into atoms that can be recomposed through relational algebra. We seek primary keys in every domain -- those unique identifiers that anchor understanding. We think in terms of constraints and integrity, recognizing that truth emerges not from individual records but from the relationships between them.Each computing epoch has demanded its own translation of this relational philosophy into silicon and wire. From mainframes executing batch jobs to client-server architectures, each generation has reimagined how to manifest set-theoretic operations in the medium of their time. Today, cloud computing presents us with new primitives: ephemeral compute, disaggregated storage, and elastic scale. Our challenge is not to abandon or even evolve the relational creed, but to discover how its eternal truths can flourish when tables grow to petabytes, when compute materializes on demand, and when the "database server" dissolves into a constellation of different hosted services.This dissertation explores how to realize the relational vision in the cloud era. We begin by improving distributed query processing through two key innovations: balancing fault recovery with pipelined execution in streaming dataflow systems, and reasoning about query execution on heterogeneous compute resources. We then turn to the storage layer, showing how to optimize cloud-native data lakes for selective queries by building consistent, bolt-on indices over object storage. We demonstrate these principles through a concrete implementation for log search, showcasing how relational operations can efficiently navigate massive volumes of semi-structured data.We hope the reader will come to appreciate how the synthesis of distributed systems theory and cloud engineering practice allows the relational model to flourish beyond its traditional confines without sacrificing its essential beauty. | |
| 653 | |a Data processing | ||
| 653 | |a Workers | ||
| 653 | |a Fault tolerance | ||
| 653 | |a Pareto optimum | ||
| 653 | |a Computer science | ||
| 773 | 0 | |t ProQuest Dissertations and Theses |g (2025) | |
| 786 | 0 | |d ProQuest |t ProQuest Dissertations & Theses Global | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3275492476/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3275492476/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |