Comprehensive Data Corruption Identification Using Machine Learning Algorithms (PAACDA)

Сохранить в:
Библиографические подробности
Опубликовано в::Turkish Journal of Computer and Mathematics Education vol. 15, no. 3 (2024), p. 144
Главный автор: Vanitha, M
Другие авторы: Maneesha, K, Sri, K Uma Renu, Nancy, K
Опубликовано:
Ninety Nine Publication
Предметы:
Online-ссылка:Citation/Abstract
Full Text - PDF
Метки: Добавить метку
Нет меток, Требуется 1-ая метка записи!

MARC

LEADER 00000nab a2200000uu 4500
001 3114535215
003 UK-CbPIL
022 |a 1309-4653 
024 7 |a 10.61841/turcomat.v15i3.14785  |2 doi 
035 |a 3114535215 
045 2 |b d20240101  |b d20241231 
100 1 |a Vanitha, M  |u Professor, Department of CSE, Malla Reddy Engineering College for Women, Autonomous, Hyderabad, 
245 1 |a Comprehensive Data Corruption Identification Using Machine Learning Algorithms (PAACDA) 
260 |b Ninety Nine Publication  |c 2024 
513 |a Journal Article 
520 3 |a Data and analysis have evolved from being scattered numbers and qualities in spreadsheets to being seen as a means to revolutionize any substantial industry, thanks to the rise of technology. There are many unethical and unlawful ways that data may get corrupted, thus it's important to find a way to effectively detect and highlight all the corrupted data in the dataset. It is not an easy task to detect damaged data or to restore information from a corrupted dataset. This is crucial and could cause issues with data processing using machines or deep learning methods later on if not handled early enough. Rather than focusing on outlier identification, this study introduces its PAACDA: Presence driven Adamic Adar Corruption identification Algorithm and then consolidates the findings. Even though they rely on parameter tuning to achieve high accuracy and remember, current state-of-the-art models like Isolation forest and DBSCAN (which stands for "Density-Based the spatial the process of clustering of the applications with Noise") have a lot of uncertainty when they factor in corrupted data. This study investigates the specific performance problems with several unsupervised learning methods on corrupted linear and clustered datasets. In addition, we provide a new PAACDA technique that achieves a higher precision of 96.35% for cluster data and 99.04% for linear data compared to previous unsupervised training benchmarks on 15 prominent baselines, including as К-means clustering, Isolation forest, and LOF (Local Outlier Factor). From the aforementioned angles, this essay delves deeply into the relevant literature as well. In this study, we identify all the problems with current methods and suggest ways forward for research in this area. 
653 |a Outliers (statistics) 
653 |a Data analysis 
653 |a Parameter identification 
653 |a Datasets 
653 |a Data processing 
653 |a Spatial data 
653 |a Technology assessment 
653 |a Clustering 
653 |a Unsupervised learning 
653 |a Damage detection 
653 |a Algorithms 
653 |a Deep learning 
653 |a Machine learning 
653 |a Accuracy 
653 |a Principal components analysis 
653 |a Identification 
653 |a Mathematics education 
653 |a Licenses 
653 |a Forests 
653 |a Open access 
700 1 |a Maneesha, K  |u Student, Department of CSE, Malla Reddy Engineering College for Women, Autonomous, Hyderabad 
700 1 |a Sri, K Uma Renu  |u Student, Department of CSE, Malla Reddy Engineering College for Women, Autonomous, Hyderabad 
700 1 |a Nancy, K  |u Student, Department of CSE, Malla Reddy Engineering College for Women, Autonomous, Hyderabad 
773 0 |t Turkish Journal of Computer and Mathematics Education  |g vol. 15, no. 3 (2024), p. 144 
786 0 |d ProQuest  |t Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3114535215/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3114535215/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch