Improving Cloud Data Processing and Storage

שמור ב:

מידע ביבליוגרפי
הוצא לאור ב:	ProQuest Dissertations and Theses (2025)
מחבר ראשי:	Wang, Ziheng
יצא לאור:	ProQuest Dissertations & Theses
נושאים:	Data processing Workers Fault tolerance Pareto optimum Computer science
גישה מקוונת:	Citation/Abstract Full Text - PDF
תגים:	הוספת תג אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

MARC


LEADER	00000nab a2200000uu 4500
001	3275492476
003	UK-CbPIL
020			\|a 9798265429520
035			\|a 3275492476
045	2		\|b d20250101 \|b d20251231
084			\|a 66569 \|2 nlm
100	1		\|a Wang, Ziheng
245	1		\|a Improving Cloud Data Processing and Storage
260			\|b ProQuest Dissertations & Theses \|c 2025
513			\|a Dissertation/Thesis
520	3		\|a SQL is not merely a query language -- it is a state of mind. To think in SQL is to view reality through the lens of sets and predicates. A crowded room becomes a table of persons, each with attributes that can be filtered, grouped, and aggregated. Conversations become transactions, friendships become foreign keys, and communities emerge from inner and outer joins. We normalize our thoughts, decomposing complex ideas into atoms that can be recomposed through relational algebra. We seek primary keys in every domain -- those unique identifiers that anchor understanding. We think in terms of constraints and integrity, recognizing that truth emerges not from individual records but from the relationships between them.Each computing epoch has demanded its own translation of this relational philosophy into silicon and wire. From mainframes executing batch jobs to client-server architectures, each generation has reimagined how to manifest set-theoretic operations in the medium of their time. Today, cloud computing presents us with new primitives: ephemeral compute, disaggregated storage, and elastic scale. Our challenge is not to abandon or even evolve the relational creed, but to discover how its eternal truths can flourish when tables grow to petabytes, when compute materializes on demand, and when the "database server" dissolves into a constellation of different hosted services.This dissertation explores how to realize the relational vision in the cloud era. We begin by improving distributed query processing through two key innovations: balancing fault recovery with pipelined execution in streaming dataflow systems, and reasoning about query execution on heterogeneous compute resources. We then turn to the storage layer, showing how to optimize cloud-native data lakes for selective queries by building consistent, bolt-on indices over object storage. We demonstrate these principles through a concrete implementation for log search, showcasing how relational operations can efficiently navigate massive volumes of semi-structured data.We hope the reader will come to appreciate how the synthesis of distributed systems theory and cloud engineering practice allows the relational model to flourish beyond its traditional confines without sacrificing its essential beauty.
653			\|a Data processing
653			\|a Workers
653			\|a Fault tolerance
653			\|a Pareto optimum
653			\|a Computer science
773	0		\|t ProQuest Dissertations and Theses \|g (2025)
786	0		\|d ProQuest \|t ProQuest Dissertations & Theses Global
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3275492476/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3275492476/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch