Analyzing Performance of Data Preprocessing Techniques on CPUs vs. GPUs with and Without the MapReduce Environment

Salvato in:
Dettagli Bibliografici
Pubblicato in:Electronics vol. 14, no. 18 (2025), p. 3597-3623
Autore principale: Bagui, Sikha S
Altri autori: Eller, Colin, Armour Rianna, Singh, Shivani, Bagui, Subhash C, Mink Dustin
Pubblicazione:
MDPI AG
Soggetti:
Accesso online:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC

LEADER 00000nab a2200000uu 4500
001 3254508438
003 UK-CbPIL
022 |a 2079-9292 
024 7 |a 10.3390/electronics14183597  |2 doi 
035 |a 3254508438 
045 2 |b d20250101  |b d20251231 
084 |a 231458  |2 nlm 
100 1 |a Bagui, Sikha S  |u Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA; ce53@students.uwf.edu (C.E.); rka7@students.uwf.edu (R.A.); fs65@students.uwf.edu (S.S.) 
245 1 |a Analyzing Performance of Data Preprocessing Techniques on CPUs vs. GPUs with and Without the MapReduce Environment 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Data preprocessing is usually necessary before running most machine learning classifiers. This work compares three different preprocessing techniques, minimal preprocessing, Principal Components Analysis (PCA), and Linear Discriminant Analysis (LDA). The efficiency of these three preprocessing techniques is measured using the Support Vector Machine (SVM) classifier. Efficiency is measured in terms of statistical metrics such as accuracy, precision, recall, the F-1 measure, and AUROC. The preprocessing times and the classifier run times are also compared using the three differently preprocessed datasets. Finally, a comparison of performance timings on CPUs vs. GPUs with and without the MapReduce environment is performed. Two newly created Zeek Connection Log datasets, collected using the Security Onion 2 network security monitor and labeled using the MITRE ATT&CK framework, UWF-ZeekData22 and UWF-ZeekDataFall22, are used for this work. Results from this work show that binomial LDA, on average, performs the best in terms of statistical measures as well as timings using GPUs or MapReduce GPUs. 
653 |a Big Data 
653 |a Machine learning 
653 |a Datasets 
653 |a Accuracy 
653 |a Preprocessing 
653 |a Security 
653 |a Principal components analysis 
653 |a Support vector machines 
653 |a Classification 
653 |a Cybersecurity 
653 |a Discriminant analysis 
653 |a Algorithms 
653 |a Performance evaluation 
653 |a Internet of Things 
700 1 |a Eller, Colin  |u Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA; ce53@students.uwf.edu (C.E.); rka7@students.uwf.edu (R.A.); fs65@students.uwf.edu (S.S.) 
700 1 |a Armour Rianna  |u Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA; ce53@students.uwf.edu (C.E.); rka7@students.uwf.edu (R.A.); fs65@students.uwf.edu (S.S.) 
700 1 |a Singh, Shivani  |u Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA; ce53@students.uwf.edu (C.E.); rka7@students.uwf.edu (R.A.); fs65@students.uwf.edu (S.S.) 
700 1 |a Bagui, Subhash C  |u Department of Mathematics and Statistics, The University of West Florida, Pensacola, FL 32514, USA; sbagu@uwf.edu 
700 1 |a Mink Dustin  |u Department of Cybersecurity, The University of West Florida, Pensacola, FL 32514, USA; dmink@uwf.edu 
773 0 |t Electronics  |g vol. 14, no. 18 (2025), p. 3597-3623 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3254508438/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3254508438/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3254508438/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch