SynopsisDB: A Distributed Data System Supports In-System Data Exploration
Gespeichert in:
| Veröffentlicht in: | ProQuest Dissertations and Theses (2025) |
|---|---|
| 1. Verfasser: | |
| Veröffentlicht: |
ProQuest Dissertations & Theses
|
| Schlagworte: | |
| Online-Zugang: | Citation/Abstract Full Text - PDF |
| Tags: |
Keine Tags, Fügen Sie das erste Tag hinzu!
|
| Abstract: | In the era of big data, domain experts commonly begin their analysis by exploring diverse datasets to gain meaningful insights. The concept of the Data Lake has emerged in recent years as a modern solution for storing and managing data from heterogeneous sources. It has quickly become the mainstream storage paradigm in industry, with widely adopted platforms such as Amazon Lake Formation, Azure Data Lake, and Google BigLake.In this thesis, we present a distributed data processing system named SynopsisDB, designed to support large-scale data exploration over data lakes. SynopsisDB consists of three layers: the storage layer, the query processing layer, and the user interface layer.The storage layer manages thousands of data files, combining storage engines of data lakes with a local Log-Structured Merge (LSM) tree–based engine. The data lake files are stored in the Hadoop Distributed File System (HDFS), while the local engine runs on a NewSQL database system that extends the leveled LSM-tree architecture, as Bi-LSM.The query processing layer features a component called SynopsisLake, which extends the Data Lakehouse architecture to manage and query thousands of data synopses. SynopsisLake bridges the gap between traditional query optimization techniques from Database Management Systems (DBMSs) and Data Warehouses and the heterogeneous, multi-resolution nature of data synopses in modern data lakes.The user interface layer supports three key operations: approximate query processing, progressive query processing, and progressive query visualization. These capabilities empower domain experts to efficiently explore their data, gain early insights, and interactively refine their queries over a short time.Together, these contributions make SynopsisDB a comprehensive and practical system for scalable, synopsis-driven data exploration in the age of big data. |
|---|---|
| ISBN: | 9798263308971 |
| Quelle: | ProQuest Dissertations & Theses Global |