BlueEdge: application design for big data cleaning processing using mobile edge computing environments
Shranjeno v:
| izdano v: | Journal of Big Data vol. 12, no. 1 (Aug 2025), p. 204 |
|---|---|
| Glavni avtor: | |
| Drugi avtorji: | , |
| Izdano: |
Springer Nature B.V.
|
| Teme: | |
| Online dostop: | Citation/Abstract Full Text Full Text - PDF |
| Oznake: |
Brez oznak, prvi označite!
|
| Resumen: | With the rapid growth of the Internet of Things (IoT) and the emergence of big data, handling massive amounts of data has become a major challenge. Traditional approaches involve sending raw data to cloud data centers for cleaning, processing, and interpretation using data warehouse tools. However, this study introduces BlueEdge, a fog edge mobile application that aims to shift the cleaning and preprocessing tasks from the cloud to the edge. We compare BlueEdge with four popular data cleaning tools (WinPure, DoubleTake, WizSame, and DQGlobal) that operate within data warehouse architectures, such as Hadoop servers. The comparison considers criteria such as time consumption, resource utilization (memory and CPU), and tool performance. BlueEdge utilizes Natural Language Processing (NLP) techniques, including those from the Natural Language Toolkit (NLTK) and Python packages, to connect with a real-time database. As shown in our results, the accuracy values that BlueEdge showed ranged between 72 and 95% across 6 categories of name-based duplicate detection tasks, proving its competitive performance in mobile edge environments. The validation of the framework was done using a larger dataset of 146 error cases with statistically significant values having confidence interval of between 3.4% to 5.8. Statistical comparison indicates consistently significant changes ( p < 0.05) compared to baseline settings of four commercial tools with large effect sizes ( Cohen d: 0.89- 1.34). BlueEdge takes care of data duplication elimination services such as using different spelling and pronunciation (78.4%, CI: 73.1–83.7%), misspellings (72.0%, CI: 66.2–77.8%), name abbreviations (90.5%, CI: 86.1–94.9%), honorific prefixes (95.2%, CI: 91.8–98.6%), common nicknames (76.2%, C The reliable performance of edge-based data cleaning is verified through cross-validation analysis (81.7% ± 2.3%), the results of which prove the consistency of its activity. Additionally, BlueEdge utilizes a minimal bandwidth of only 5000 bytes per edge on mobile phones, unlike data warehouses that require 10,000–60,000 bytes on Hadoop machines. Additionally, BlueEdge is designed to reduce the time taken for data cleaning to 1 s at the data edge, unlike the standard 4–30 s it normally takes for data warehouses. The blue edge is easy to use without authorization of the mobile devices, where the application is conducted free of charge. The framework was validated through controlled experimental testing and real-world deployment at an IT services company, achieving an overall ITSQM quality score of 8.9/10 and demonstrating practical effectiveness in organizational settings. This foundation has been further enhanced with neural network-based classification approaches, which are currently under peer review. |
|---|---|
| ISSN: | 2196-1115 |
| DOI: | 10.1186/s40537-025-01262-y |
| Fuente: | ABI/INFORM Global |