Investigating skew effects in shared-nothing parallel database systems

Na minha lista:

Detalhes bibliográficos
Publicado no:	ProQuest Dissertations and Theses (1993)
Autor principal:	Hu, Ron-Chung
Publicado em:	ProQuest Dissertations & Theses
Assuntos:	Computer science
Acesso em linha:	Citation/Abstract Full Text - PDF
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

Descrição
Resumo:	Larger databases and cheaper hardware have generated great interest in applying database applications to parallel architectures. A database system based on multiple processors that share nothing (i.e. share neither main memory nor disks) is one way to provide the functionality of a conventional DBMS. To exploit parallelism, the shared-nothing parallel system horizontally partitions a data relation across all the processors. Proponents of this loosely-coupled approach claim such a parallel architecture can achieve high scalability and provide good cost-performance. However, the effectiveness of parallel executions on a shared-nothing system depends on our ability to equally divide the load among the nodes while minimizing the coordination overhead. In this dissertation, we investigate the skew effects, which frequently cause load imbalance and impair system performance if improperly handled, in parallel database systems. We discuss the nature of skew effects and the reason why they cause performance problems. In order to take full advantage of parallel executions, we study three major performance-oriented topics: Query Optimization, Index Mechanism, and Parallel Join Operation. In each topic, we illustrate the flaws in existing methods which are often straightforward generalizations of conventional database techniques when applied to parallel database systems. In query optimization, we propose the two-level-query-optimization approach in which query optimization functions are split into system level and node level. We suggest to migrate all the decisions which need to consider individual node's data distribution to the node level. We show that this new approach is especially beneficial to large parallel systems which are vulnerable to the presence of various skew effects. In index mechanism, we present the unified index mechanism by concurrently incorporating both local and distributive mechanisms in a single index. We perform simulation experiments to validate the effectiveness of this new index mechanism. We devise the two-threshold-mechanism to efficiently maintain it. In parallel join operation, we introduce two modified parallel hash join algorithms using tuple duplication and partial duplication schemes respectively. We identify the domains in which our algorithms can provide good performance. As we extend our knowledge of effective parallel executions, our research contributes an essential step in achieving a high performance database system.
ISBN:	9798208316412
Fonte:	ProQuest Dissertations & Theses Global