Declarative Analytics on Heterogeneous HPC Systems

Guardado en:

Detalles Bibliográficos
Publicado en:	ProQuest Dissertations and Theses (2025)
Autor principal:	Shovon, Ahmedur Rahman
Publicado:	ProQuest Dissertations & Theses
Materias:	Computer science Computer engineering Information science
Acceso en línea:	Citation/Abstract Full Text - PDF
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Descripción
Resumen:	The emergence of exascale systems marks a transforming era in high-performance computing (HPC) powered by extensive use of GPUs. GPGPU's popularity in HPC, due to performance gains and power efficiency, demands redesigning traditional algorithms to exploit GPU parallelism. However, declarative languages, like Datalog, can directly leverage these advancements due to their ability to express complex problems through simple rules and queries, which can be efficiently compiled into relational algebra operations for execution on GPGPUs. Integrating Datalog's declarative syntax with GPGPU's computational power enables scalable declarative analytics across big data, graph mining, and program analysis on HPC systems. <diss_para /> While recent advancements have focused on multi-threaded and multi-core implementations of Datalog, the evolution of exascale systems presents a compelling opportunity to extend Datalog’s capabilities to multi-node, multi-GPU environments. This thesis addresses this gap by developing the first multi-GPU, multi-node Datalog engine. First we investigate the parallelization of iterated operations involving relational algebra primitives on GPUs, which are fundamental to Datalog operations. Then, we address challenges specific to heterogeneous architectures, including optimized communication strategies, recursive aggregation techniques, and efficient join operations, all tailored for a heterogeneous Datalog backend. We focus on optimizing specialized Datalog implementations for graph algorithms, including path-finding and topology-based feature extraction. For testing and benchmarking of the algorithms, we utilize publicly available datasets from the Stanford Large Network Dataset Collection and the SuiteSparse Matrix Collection. Our research extends beyond traditional graph mining and program analysis, exploring Datalog's potential in emerging domains such as topological data analysis, machine learning, and visual analytics for high-dimensional data. Evaluating power consumption alongside performance enhancement is increasingly vital in HPC systems, as energy efficiency significantly impacts operational sustainability and cost-effectiveness. Thus we conduct power analysis across GPU-based Datalog engines, which differ primarily in their recursive join strategies and underlying data structures. We evaluate how variations in implementation techniques for the same application, executed on identical hardware and datasets, influence power consumption. By advancing Datalog's applicability in exascale environments, we aim to demonstrate its scalability and suitability for performance and energy-efficient analysis of complex data on next-generation computing platforms.
ISBN:	9798263306762
Fuente:	ProQuest Dissertations & Theses Global