Domain-Specific Benchmarks and Architectures for Applications Using Graph-Based Data

Guardado en:

书目详细资料
发表在:	ProQuest Dissertations and Theses (2025)
主要作者:	McCrabb, Andrew
出版:	ProQuest Dissertations & Theses
主题:	Computer engineering Computer science Information technology
在线阅读:	Citation/Abstract Full Text - PDF
标签:	添加标签没有标签, 成为第一个标记此记录!

实物特征
摘要:	Graph-based processing enables many applications in logistics, e-commerce, social media, and more. However, graph workloads are slow: they are bottlenecked not by compute power, but by inefficient data access. As useful graphs get larger and graph-based algorithms become more complex, adding more powerful compute units like Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs) cannot keep up with the increasing size and complexity of these workloads.To address these challenges, we first introduce a novel taxonomy of graph-based algorithms: those that treat graphs (1) as data frameworks, (2) as algorithmic frameworks, or (3) as both. Each category has overlapping needs: higher memory bandwidth, better data organization, and greater thread-level parallelism. Next, we demonstrate that custom processing-in-memory (PIM) hardware accelerators are effective and energy-efficient solutions to the compute and memory bottlenecks of graph-based applications.Specifically, we propose and evaluate three custom PIM accelerators, DREDGE (for graph-as-data-framework applications), ACRE (for graph-as-algorithmic-framework applications), and GLEAM (for graph-as-both applications), each targeting one of the three categories of graph applications. DREDGE targets dynamic graph applications by introducing a novel partitioning technique and dedicated hardware support to continuously improve data organization in memory. ACRE accelerates the training of tree-based machine learning models in a way that allows users to better understand the models’ reasoning. GLEAM targets graph neural networks, the primary machine learning models for graph-based data, accelerating the node aggregation operations that bottleneck training and inference operations. These three designs offer a 2.5-14x speedup for their respective applications, and they save 77-93% of total system energy over their respective baselines. Each design fits within the logic area of modern 3D-stacked memory: 0.3-13% of the available logic space. Finally, we present two benchmark suites, DyGraph and BeXAI, making them publicly available to support future research in dynamic graphs processing and explainable machine learning acceleration. Together, these contributions enable efficient and scalable graph computing to handle the demands of tomorrow’s graph workloads.
ISBN:	9798297610064
Fuente:	ProQuest Dissertations & Theses Global