High-Performance Matrix Multiplication Using Block Parallelization on CPU and GPU

Saved in:
Bibliographic Details
Published in:The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings (2025), p. 1-6
Main Author: Rakhimov, Mekhriddin
Other Authors: Mannon Ochilov, Javliev, Shakhzod
Published:
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Citation/Abstract
Tags: Add Tag
No Tags, Be the first to tag this record!

MARC

LEADER 00000nab a2200000uu 4500
001 3284878429
003 UK-CbPIL
024 7 |a 10.1109/APEIE66761.2025.11289229  |2 doi 
035 |a 3284878429 
045 2 |b d20250101  |b d20251231 
084 |a 228229  |2 nlm 
100 1 |a Rakhimov, Mekhriddin  |u Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi,Department of Computer Systems,Tashkent,Uzbekistan 
245 1 |a High-Performance Matrix Multiplication Using Block Parallelization on CPU and GPU 
260 |b The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  |c 2025 
513 |a Conference Proceedings 
520 3 |a Conference Title: 2025 IEEE XVII International Scientific and Technical Conference on Actual Problems of Electronic Instrument Engineering (APEIE)Conference Start Date: 2025 Nov. 14Conference End Date: 2025 Nov. 16Conference Location: Novosibirsk, Russian FederationThis paper examines the performance issues associated with computing devices performing arithmetic operations on large matrices. One of the optimal methods for matrix multiplication is to calculate a large matrix by dividing it into blocks using the block-based method. This is achieved by multiplying matrices of different sizes using the parallel block-based method on a computer’s graphics processing unit using Compute Unified Device Architecture technology, as well as on a central processor using the Open Multi-Processing parallel library for devices without a graphics processing unit. The study examines the time-consuming task of multiplying matrices of sizes 64x64, 128x128, 512x512, 1024x1024, and 2048x2048 using a simple sequential naive method and a parallel block-based method using these parallel processing technologies. The performance of the block parallel method implemented on a graphics processing unit using Compute Unified Device Architecture technology and on a central processing unit using Open Multi-Processing technology is also compared with the existing CUDA basic linear algebra subprograms libraries for NVIDIA graphics processing units and Intel Math Kernel Library for Intel processors. The proposed approach allows the user to fully control the programming model, customize the algorithm, change the block size, and perform computations as quickly as existing libraries. 
653 |a Parallel processing 
653 |a Multiplication 
653 |a Central processing units--CPUs 
653 |a Linear algebra 
653 |a Computer architecture 
653 |a Matrices (mathematics) 
653 |a Electronic equipment 
653 |a Graphics processing units 
653 |a Microprocessors 
653 |a Economic 
700 1 |a Mannon Ochilov  |u Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi,Department of Artificial Intelligence,Tashkent,Uzbekistan 
700 1 |a Javliev, Shakhzod  |u Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi,Department of Computer Systems,Tashkent,Uzbekistan 
773 0 |t The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings  |g (2025), p. 1-6 
786 0 |d ProQuest  |t Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3284878429/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch