High-Performance Matrix Multiplication Using Block Parallelization on CPU and GPU

Saved in:

Bibliographic Details
Published in:	The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings (2025), p. 1-6
Main Author:	Rakhimov, Mekhriddin
Other Authors:	Mannon Ochilov, Javliev, Shakhzod
Published:	The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Parallel processing Multiplication Central processing units > CPUs Linear algebra Computer architecture Matrices (mathematics) Electronic equipment Graphics processing units Microprocessors Economic
Online Access:	Citation/Abstract
Tags:	Add Tag No Tags, Be the first to tag this record!

MARC


LEADER	00000nab a2200000uu 4500
001	3284878429
003	UK-CbPIL
024	7		\|a 10.1109/APEIE66761.2025.11289229 \|2 doi
035			\|a 3284878429
045	2		\|b d20250101 \|b d20251231
084			\|a 228229 \|2 nlm
100	1		\|a Rakhimov, Mekhriddin \|u Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi,Department of Computer Systems,Tashkent,Uzbekistan
245	1		\|a High-Performance Matrix Multiplication Using Block Parallelization on CPU and GPU
260			\|b The Institute of Electrical and Electronics Engineers, Inc. (IEEE) \|c 2025
513			\|a Conference Proceedings
520	3		\|a Conference Title: 2025 IEEE XVII International Scientific and Technical Conference on Actual Problems of Electronic Instrument Engineering (APEIE)Conference Start Date: 2025 Nov. 14Conference End Date: 2025 Nov. 16Conference Location: Novosibirsk, Russian FederationThis paper examines the performance issues associated with computing devices performing arithmetic operations on large matrices. One of the optimal methods for matrix multiplication is to calculate a large matrix by dividing it into blocks using the block-based method. This is achieved by multiplying matrices of different sizes using the parallel block-based method on a computer’s graphics processing unit using Compute Unified Device Architecture technology, as well as on a central processor using the Open Multi-Processing parallel library for devices without a graphics processing unit. The study examines the time-consuming task of multiplying matrices of sizes 64x64, 128x128, 512x512, 1024x1024, and 2048x2048 using a simple sequential naive method and a parallel block-based method using these parallel processing technologies. The performance of the block parallel method implemented on a graphics processing unit using Compute Unified Device Architecture technology and on a central processing unit using Open Multi-Processing technology is also compared with the existing CUDA basic linear algebra subprograms libraries for NVIDIA graphics processing units and Intel Math Kernel Library for Intel processors. The proposed approach allows the user to fully control the programming model, customize the algorithm, change the block size, and perform computations as quickly as existing libraries.
653			\|a Parallel processing
653			\|a Multiplication
653			\|a Central processing units--CPUs
653			\|a Linear algebra
653			\|a Computer architecture
653			\|a Matrices (mathematics)
653			\|a Electronic equipment
653			\|a Graphics processing units
653			\|a Microprocessors
653			\|a Economic
700	1		\|a Mannon Ochilov \|u Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi,Department of Artificial Intelligence,Tashkent,Uzbekistan
700	1		\|a Javliev, Shakhzod \|u Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi,Department of Computer Systems,Tashkent,Uzbekistan
773	0		\|t The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings \|g (2025), p. 1-6
786	0		\|d ProQuest \|t Science Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3284878429/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch