BlueEdge: application design for big data cleaning processing using mobile edge computing environments

Gorde:
Xehetasun bibliografikoak
Argitaratua izan da:Journal of Big Data vol. 12, no. 1 (Aug 2025), p. 204
Egile nagusia: Elmobark, Nagwa
Beste egile batzuk: El-ghareeb, Haitham, Elhishi, Sara
Argitaratua:
Springer Nature B.V.
Gaiak:
Sarrera elektronikoa:Citation/Abstract
Full Text
Full Text - PDF
Etiketak: Etiketa erantsi
Etiketarik gabe, Izan zaitez lehena erregistro honi etiketa jartzen!

MARC

LEADER 00000nab a2200000uu 4500
001 3241754801
003 UK-CbPIL
022 |a 2196-1115 
024 7 |a 10.1186/s40537-025-01262-y  |2 doi 
035 |a 3241754801 
045 2 |b d20250801  |b d20250831 
100 1 |a Elmobark, Nagwa  |u Mansoura University, Faculty of Computer Science, Mansoura, Egypt (GRID:grid.10251.37) (ISNI:0000 0001 0342 6662) 
245 1 |a BlueEdge: application design for big data cleaning processing using mobile edge computing environments 
260 |b Springer Nature B.V.  |c Aug 2025 
513 |a Journal Article 
520 3 |a With the rapid growth of the Internet of Things (IoT) and the emergence of big data, handling massive amounts of data has become a major challenge. Traditional approaches involve sending raw data to cloud data centers for cleaning, processing, and interpretation using data warehouse tools. However, this study introduces BlueEdge, a fog edge mobile application that aims to shift the cleaning and preprocessing tasks from the cloud to the edge. We compare BlueEdge with four popular data cleaning tools (WinPure, DoubleTake, WizSame, and DQGlobal) that operate within data warehouse architectures, such as Hadoop servers. The comparison considers criteria such as time consumption, resource utilization (memory and CPU), and tool performance. BlueEdge utilizes Natural Language Processing (NLP) techniques, including those from the Natural Language Toolkit (NLTK) and Python packages, to connect with a real-time database. As shown in our results, the accuracy values that BlueEdge showed ranged between 72 and 95% across 6 categories of name-based duplicate detection tasks, proving its competitive performance in mobile edge environments. The validation of the framework was done using a larger dataset of 146 error cases with statistically significant values having confidence interval of between 3.4% to 5.8. Statistical comparison indicates consistently significant changes ( p < 0.05) compared to baseline settings of four commercial tools with large effect sizes ( Cohen d: 0.89- 1.34). BlueEdge takes care of data duplication elimination services such as using different spelling and pronunciation (78.4%, CI: 73.1–83.7%), misspellings (72.0%, CI: 66.2–77.8%), name abbreviations (90.5%, CI: 86.1–94.9%), honorific prefixes (95.2%, CI: 91.8–98.6%), common nicknames (76.2%, C The reliable performance of edge-based data cleaning is verified through cross-validation analysis (81.7% ± 2.3%), the results of which prove the consistency of its activity. Additionally, BlueEdge utilizes a minimal bandwidth of only 5000 bytes per edge on mobile phones, unlike data warehouses that require 10,000–60,000 bytes on Hadoop machines. Additionally, BlueEdge is designed to reduce the time taken for data cleaning to 1&#xa0;s at the data edge, unlike the standard 4–30&#xa0;s it normally takes for data warehouses. The blue edge is easy to use without authorization of the mobile devices, where the application is conducted free of charge. The framework was validated through controlled experimental testing and real-world deployment at an IT services company, achieving an overall ITSQM quality score of 8.9/10 and demonstrating practical effectiveness in organizational settings. This foundation has been further enhanced with neural network-based classification approaches, which are currently under peer review. 
653 |a Personal information 
653 |a Internet of Things 
653 |a Big Data 
653 |a Applications programs 
653 |a Cleaning 
653 |a Cell phones 
653 |a Edge computing 
653 |a Mobile computing 
653 |a Data processing 
653 |a Manufacturing 
653 |a Privacy 
653 |a Statistical analysis 
653 |a Consent 
653 |a Accountability 
653 |a Toolkits 
653 |a Neural networks 
653 |a Error correction & detection 
653 |a Abbreviations 
653 |a Data warehouses 
653 |a Cloud computing 
653 |a Sensors 
653 |a Transparency 
653 |a Compliance 
653 |a Resource utilization 
653 |a Real time 
653 |a Natural language processing 
653 |a Cost control 
653 |a Security systems 
653 |a Nicknames 
653 |a Databases 
653 |a Classification 
653 |a Personal names 
653 |a Task performance 
653 |a Organizational effectiveness 
653 |a Validity 
653 |a Authorization 
653 |a Peer review 
653 |a Pronunciation 
653 |a Application 
653 |a Internet 
653 |a Warehouses 
653 |a Software 
653 |a Deployment 
653 |a Spelling 
653 |a Honorifics 
653 |a Mobile phones 
653 |a Elimination 
653 |a Machinery 
653 |a Prefixes 
700 1 |a El-ghareeb, Haitham  |u Mansoura University, Faculty of Computer Science, Mansoura, Egypt (GRID:grid.10251.37) (ISNI:0000 0001 0342 6662) 
700 1 |a Elhishi, Sara  |u Mansoura University, Faculty of Computer Science, Mansoura, Egypt (GRID:grid.10251.37) (ISNI:0000 0001 0342 6662) 
773 0 |t Journal of Big Data  |g vol. 12, no. 1 (Aug 2025), p. 204 
786 0 |d ProQuest  |t ABI/INFORM Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3241754801/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3241754801/fulltext/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3241754801/fulltextPDF/embedded/H09TXR3UUZB2ISDL?source=fedsrch