Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation

Guardado en:
Bibliografiske detaljer
Udgivet i:arXiv.org (Dec 23, 2024), p. n/a
Hovedforfatter: Wang, Tuowei
Andre forfattere: Li, Kun, Bai, Donglin, Ju, Fusong, Xia, Leo, Cao, Ting, Ren, Ju, Zhang, Yaoxue, Mao, Yang
Udgivet:
Cornell University Library, arXiv.org
Fag:
Online adgang:Citation/Abstract
Full text outside of ProQuest
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!

MARC

LEADER 00000nab a2200000uu 4500
001 3147267343
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3147267343 
045 0 |b d20241223 
100 1 |a Wang, Tuowei 
245 1 |a Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation 
260 |b Cornell University Library, arXiv.org  |c Dec 23, 2024 
513 |a Working Paper 
520 3 |a AI infrastructures, predominantly GPUs, have delivered remarkable performance gains for deep learning. Conversely, scientific computing, exemplified by quantum chemistry systems, suffers from dynamic diversity, where computational patterns are more diverse and vary dynamically, posing a significant challenge to sponge acceleration off GPUs. In this paper, we propose Matryoshka, a novel elastically-parallel technique for the efficient execution of quantum chemistry system with dynamic diversity on GPU. Matryoshka capitalizes on Elastic Parallelism Transformation, a property prevalent in scientific systems yet underexplored for dynamic diversity, to elastically realign parallel patterns with GPU architecture. Structured around three transformation primitives (Permutation, Deconstruction, and Combination), Matryoshka encompasses three core components. The Block Constructor serves as the central orchestrator, which reformulates data structures accommodating dynamic inputs and constructs fine-grained GPU-efficient compute blocks. Within each compute block, the Graph Compiler operates offline, generating high-performance code with clear computational path through an automated compilation process. The Workload Allocator dynamically schedules workloads with varying operational intensities to threads online. It achieves highly efficient parallelism for compute-intensive operations and facilitates fusion with neighboring memory-intensive operations automatically. Extensive evaluation shows that Matryoshka effectively addresses dynamic diversity, yielding acceleration improvements of up to 13.86x (average 9.41x) over prevailing state-of-the-art approaches on 13 quantum chemistry systems. 
653 |a Elastic properties 
653 |a Workload 
653 |a Parallel processing 
653 |a Quantum chemistry 
653 |a Permutations 
653 |a Graphics processing units 
653 |a Data structures 
700 1 |a Li, Kun 
700 1 |a Bai, Donglin 
700 1 |a Ju, Fusong 
700 1 |a Xia, Leo 
700 1 |a Cao, Ting 
700 1 |a Ren, Ju 
700 1 |a Zhang, Yaoxue 
700 1 |a Mao, Yang 
773 0 |t arXiv.org  |g (Dec 23, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3147267343/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.13203