An alternative GPU acceleration for a pseudopotential plane-waves density functional theory code with applications to metallic systems

Guardado en:

Bibliografiske detaljer
Udgivet i:	arXiv.org (Dec 2, 2024), p. n/a
Hovedforfatter:	Gong, Xuejun
Andre forfattere:	Andrea Dal Corso
Udgivet:	Cornell University Library, arXiv.org
Fag:	Parallel processing Linear algebra Matrices (mathematics) Graphics processing units Pseudopotentials Plane waves Linear systems Routines Matrix algebra Brillouin zones Density functional theory Perturbation theory Electronic structure Bloch waves Functionals Hamiltonian functions
Online adgang:	Citation/Abstract Full text outside of ProQuest
Tags:	Tilføj Tag Ingen Tags, Vær først til at tagge denne postø!

MARC


LEADER	00000nab a2200000uu 4500
001	3138995499
003	UK-CbPIL
022			\|a 2331-8422
024	7		\|a 10.1016/j.cpc.2024.109439 \|2 doi
035			\|a 3138995499
045	0		\|b d20241202
100	1		\|a Gong, Xuejun
245	1		\|a An alternative GPU acceleration for a pseudopotential plane-waves density functional theory code with applications to metallic systems
260			\|b Cornell University Library, arXiv.org \|c Dec 2, 2024
513			\|a Working Paper
520	3		\|a We present an alternative GPU acceleration for plane waves pseudopotentials electronic structure codes designed for systems that have small unit cells but require a large number of k points to sample the Brillouin zone as happens, for instance, in metals. We discuss the diagonalization of the Kohn and Sham equations and the solution of the linear system derived in density functional perturbation theory. Both problems take advantage from a rewriting of the routine that applies the Hamiltonian to the Bloch wave-functions to work simultaneously (in parallel on the GPU threads) on the wave-functions with different wave-vectors k, as many as allowed by the GPU memory. Our implementation is written in CUDA Fortran and makes extensive use of kernel routines that run on the GPU (GLOBAL routines) or can be called from inside the GPU threads (DEVICE routines). We compare our method with the CPUs only calculation and with the approach currently implemented in Quantum ESPRESSO that uses GPU accelerated libraries for the FFT and for the linear algebra tasks such as the matrix-matrix multiplications as well as OpenACC directives for loop parallelization. We show in a realistic example that our method can give a significant improvement in the cases for which it has been designed.
653			\|a Parallel processing
653			\|a Linear algebra
653			\|a Matrices (mathematics)
653			\|a Graphics processing units
653			\|a Pseudopotentials
653			\|a Plane waves
653			\|a Linear systems
653			\|a Routines
653			\|a Matrix algebra
653			\|a Brillouin zones
653			\|a Density functional theory
653			\|a Perturbation theory
653			\|a Electronic structure
653			\|a Bloch waves
653			\|a Functionals
653			\|a Hamiltonian functions
700	1		\|a Andrea Dal Corso
773	0		\|t arXiv.org \|g (Dec 2, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3138995499/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2412.01695