HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores

Guardat en:
Dades bibliogràfiques
Publicat a:arXiv.org (Dec 12, 2024), p. n/a
Autor principal: Li, Zhonggen
Altres autors: Ke, Xiangyu, Zhu, Yifan, Gao, Yunjun, Tu, Yaofeng
Publicat:
Cornell University Library, arXiv.org
Matèries:
Accés en línia:Citation/Abstract
Full text outside of ProQuest
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3144195756
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3144195756 
045 0 |b d20241212 
100 1 |a Li, Zhonggen 
245 1 |a HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores 
260 |b Cornell University Library, arXiv.org  |c Dec 12, 2024 
513 |a Working Paper 
520 3 |a Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental operation in graph computing and analytics. However, the irregularity of real-world graphs poses significant challenges to achieving efficient SpMM operation for graph data on GPUs. Recently, significant advancements in GPU computing power and the introduction of new efficient computing cores within GPUs offer new opportunities for acceleration. In this paper, we present HC-SpMM, a pioneering algorithm that leverages hybrid GPU cores (Tensor cores and CUDA cores) to accelerate SpMM for graphs. To adapt to the computing characteristics of different GPU cores, we investigate the impact of sparse graph features on the performance of different cores, develop a data partitioning technique for the graph adjacency matrix, and devise a novel strategy for intelligently selecting the most efficient cores for processing each submatrix. Additionally, we optimize it by considering memory access and thread utilization, to utilize the computational resources to their fullest potential. To support complex graph computing workloads, we integrate HC-SpMM into the GNN training pipeline. Furthermore, we propose a kernel fusion strategy to enhance data reuse, as well as a cost-effective graph layout reorganization method to mitigate the irregular and sparse issues of real-world graphs, better fitting the computational models of hybrid GPU cores. Extensive experiments on 14 real-world graph datasets demonstrate that HC-SpMM achieves an average speedup of 1.33x and 1.23x over state-of-the-art SpMM kernels and GNN frameworks. 
653 |a Sparsity 
653 |a Algorithms 
653 |a Computation 
653 |a Graphs 
653 |a Graphics processing units 
653 |a Pipelining (computers) 
653 |a Sparse matrices 
653 |a Tensors 
700 1 |a Ke, Xiangyu 
700 1 |a Zhu, Yifan 
700 1 |a Gao, Yunjun 
700 1 |a Tu, Yaofeng 
773 0 |t arXiv.org  |g (Dec 12, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3144195756/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.08902