Scalable QR Factorisation of Ill-Conditioned Tall-and-Skinny Matrices on Distributed GPU Systems
Guardado en:
| Publicado en: | Mathematics vol. 13, no. 22 (2025), p. 3608-3629 |
|---|---|
| Autor principal: | |
| Otros Autores: | , , |
| Publicado: |
MDPI AG
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3275542003 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2227-7390 | ||
| 024 | 7 | |a 10.3390/math13223608 |2 doi | |
| 035 | |a 3275542003 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 231533 |2 nlm | ||
| 100 | 1 | |a Mijić Nenad |u Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia; nenad.mijic@irb.hr (N.M.); abhiram.k.badrinarayanan@jyu.fi (A.K.); dario.zivkovic@irb.hr (D.Ž.) | |
| 245 | 1 | |a Scalable QR Factorisation of Ill-Conditioned Tall-and-Skinny Matrices on Distributed GPU Systems | |
| 260 | |b MDPI AG |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a The QR factorisation is a cornerstone of numerical linear algebra, essential for solving overdetermined linear systems, eigenvalue problems, and various scientific computing tasks. However, computing it for ill-conditioned tall-and-skinny (TS) matrices on large-scale distributed-memory systems, particularly those with multiple GPUs, presents significant challenges in balancing numerical stability, high performance, and efficient communication. Traditional Householder-based QR methods provide numerical stability but perform poorly on TS matrices due to their reliance on memory-bound kernels. This paper introduces a novel algorithm for computing the QR factorisation of ill-conditioned TS matrices based on CholeskyQR methods. Although CholeskyQR is fast, it typically fails due to severe loss of orthogonality for ill-conditioned inputs. To solve this, our new algorithm, mCQRGSI+, combines the speed of CholeskyQR with stabilising techniques from the Gram–Schmidt process. It is specifically optimised for distributed multi-GPU systems, using adaptive strategies to balance computation and communication. Our analysis shows the method achieves accuracy comparable to Householder QR, even for extremely ill-conditioned matrices (condition numbers up to <inline-formula>1016</inline-formula>). Scaling experiments demonstrate speedups of up to <inline-formula>12×</inline-formula> over ScaLAPACK and <inline-formula>16×</inline-formula> over SLATE’s CholeskyQR2. This work delivers a method that is both robust and highly parallel, advancing the state-of-the-art for this challenging class of problems. | |
| 653 | |a Eigenvalues | ||
| 653 | |a Adaptive systems | ||
| 653 | |a Computation | ||
| 653 | |a Linear algebra | ||
| 653 | |a Matrices (mathematics) | ||
| 653 | |a Graphics processing units | ||
| 653 | |a Communication | ||
| 653 | |a Decomposition | ||
| 653 | |a Linear systems | ||
| 653 | |a Algorithms | ||
| 653 | |a Matrix algebra | ||
| 653 | |a Numerical stability | ||
| 653 | |a Stability | ||
| 653 | |a Critical path | ||
| 653 | |a Distributed memory | ||
| 653 | |a Orthogonality | ||
| 653 | |a Factorization | ||
| 700 | 1 | |a Kaushik Abhiram |u Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia; nenad.mijic@irb.hr (N.M.); abhiram.k.badrinarayanan@jyu.fi (A.K.); dario.zivkovic@irb.hr (D.Ž.) | |
| 700 | 1 | |a Živković Dario |u Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia; nenad.mijic@irb.hr (N.M.); abhiram.k.badrinarayanan@jyu.fi (A.K.); dario.zivkovic@irb.hr (D.Ž.) | |
| 700 | 1 | |a Davidović Davor |u Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia; nenad.mijic@irb.hr (N.M.); abhiram.k.badrinarayanan@jyu.fi (A.K.); dario.zivkovic@irb.hr (D.Ž.) | |
| 773 | 0 | |t Mathematics |g vol. 13, no. 22 (2025), p. 3608-3629 | |
| 786 | 0 | |d ProQuest |t Engineering Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3275542003/abstract/embedded/9R349J4AAH19K9LJ?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text + Graphics |u https://www.proquest.com/docview/3275542003/fulltextwithgraphics/embedded/9R349J4AAH19K9LJ?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3275542003/fulltextPDF/embedded/9R349J4AAH19K9LJ?source=fedsrch |