High-Performance Matrix Multiplication Using Block Parallelization on CPU and GPU

Guardado en:

Detalles Bibliográficos
Publicado en:	The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings (2025), p. 1-6
Autor principal:	Rakhimov, Mekhriddin
Otros Autores:	Mannon Ochilov, Javliev, Shakhzod
Publicado:	The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Materias:	Parallel processing Multiplication Central processing units > CPUs Linear algebra Computer architecture Matrices (mathematics) Electronic equipment Graphics processing units Microprocessors Economic
Acceso en línea:	Citation/Abstract
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Descripción
Resumen:	Conference Title: 2025 IEEE XVII International Scientific and Technical Conference on Actual Problems of Electronic Instrument Engineering (APEIE)Conference Start Date: 2025 Nov. 14Conference End Date: 2025 Nov. 16Conference Location: Novosibirsk, Russian FederationThis paper examines the performance issues associated with computing devices performing arithmetic operations on large matrices. One of the optimal methods for matrix multiplication is to calculate a large matrix by dividing it into blocks using the block-based method. This is achieved by multiplying matrices of different sizes using the parallel block-based method on a computer’s graphics processing unit using Compute Unified Device Architecture technology, as well as on a central processor using the Open Multi-Processing parallel library for devices without a graphics processing unit. The study examines the time-consuming task of multiplying matrices of sizes 64x64, 128x128, 512x512, 1024x1024, and 2048x2048 using a simple sequential naive method and a parallel block-based method using these parallel processing technologies. The performance of the block parallel method implemented on a graphics processing unit using Compute Unified Device Architecture technology and on a central processing unit using Open Multi-Processing technology is also compared with the existing CUDA basic linear algebra subprograms libraries for NVIDIA graphics processing units and Intel Math Kernel Library for Intel processors. The proposed approach allows the user to fully control the programming model, customize the algorithm, change the block size, and perform computations as quickly as existing libraries.
DOI:	10.1109/APEIE66761.2025.11289229
Fuente:	Science Database