Scalable QR Factorisation of Ill-Conditioned Tall-and-Skinny Matrices on Distributed GPU Systems

Guardado en:
Detalles Bibliográficos
Publicado en:Mathematics vol. 13, no. 22 (2025), p. 3608-3629
Autor principal: Mijić Nenad
Otros Autores: Kaushik Abhiram, Živković Dario, Davidović Davor
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3275542003
003 UK-CbPIL
022 |a 2227-7390 
024 7 |a 10.3390/math13223608  |2 doi 
035 |a 3275542003 
045 2 |b d20250101  |b d20251231 
084 |a 231533  |2 nlm 
100 1 |a Mijić Nenad  |u Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia; nenad.mijic@irb.hr (N.M.); abhiram.k.badrinarayanan@jyu.fi (A.K.); dario.zivkovic@irb.hr (D.Ž.) 
245 1 |a Scalable QR Factorisation of Ill-Conditioned Tall-and-Skinny Matrices on Distributed GPU Systems 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a The QR factorisation is a cornerstone of numerical linear algebra, essential for solving overdetermined linear systems, eigenvalue problems, and various scientific computing tasks. However, computing it for ill-conditioned tall-and-skinny (TS) matrices on large-scale distributed-memory systems, particularly those with multiple GPUs, presents significant challenges in balancing numerical stability, high performance, and efficient communication. Traditional Householder-based QR methods provide numerical stability but perform poorly on TS matrices due to their reliance on memory-bound kernels. This paper introduces a novel algorithm for computing the QR factorisation of ill-conditioned TS matrices based on CholeskyQR methods. Although CholeskyQR is fast, it typically fails due to severe loss of orthogonality for ill-conditioned inputs. To solve this, our new algorithm, mCQRGSI+, combines the speed of CholeskyQR with stabilising techniques from the Gram–Schmidt process. It is specifically optimised for distributed multi-GPU systems, using adaptive strategies to balance computation and communication. Our analysis shows the method achieves accuracy comparable to Householder QR, even for extremely ill-conditioned matrices (condition numbers up to <inline-formula>1016</inline-formula>). Scaling experiments demonstrate speedups of up to <inline-formula>12×</inline-formula> over ScaLAPACK and <inline-formula>16×</inline-formula> over SLATE’s CholeskyQR2. This work delivers a method that is both robust and highly parallel, advancing the state-of-the-art for this challenging class of problems. 
653 |a Eigenvalues 
653 |a Adaptive systems 
653 |a Computation 
653 |a Linear algebra 
653 |a Matrices (mathematics) 
653 |a Graphics processing units 
653 |a Communication 
653 |a Decomposition 
653 |a Linear systems 
653 |a Algorithms 
653 |a Matrix algebra 
653 |a Numerical stability 
653 |a Stability 
653 |a Critical path 
653 |a Distributed memory 
653 |a Orthogonality 
653 |a Factorization 
700 1 |a Kaushik Abhiram  |u Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia; nenad.mijic@irb.hr (N.M.); abhiram.k.badrinarayanan@jyu.fi (A.K.); dario.zivkovic@irb.hr (D.Ž.) 
700 1 |a Živković Dario  |u Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia; nenad.mijic@irb.hr (N.M.); abhiram.k.badrinarayanan@jyu.fi (A.K.); dario.zivkovic@irb.hr (D.Ž.) 
700 1 |a Davidović Davor  |u Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia; nenad.mijic@irb.hr (N.M.); abhiram.k.badrinarayanan@jyu.fi (A.K.); dario.zivkovic@irb.hr (D.Ž.) 
773 0 |t Mathematics  |g vol. 13, no. 22 (2025), p. 3608-3629 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3275542003/abstract/embedded/9R349J4AAH19K9LJ?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3275542003/fulltextwithgraphics/embedded/9R349J4AAH19K9LJ?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3275542003/fulltextPDF/embedded/9R349J4AAH19K9LJ?source=fedsrch