Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost

Guardado en:
Detalles Bibliográficos
Publicado en:Electronics vol. 14, no. 9 (2025), p. 1754
Autor principal: Theodorakopoulos Leonidas
Otros Autores: Theodoropoulou Alexandra, Tsimakis Anastasios, Halkiopoulos Constantinos
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:This study presents an optimization for a distributed machine learning framework to achieve credit card fraud detection scalability. Due to the growth in fraudulent activities, this research implements the PySpark-based processing of large-scale transaction datasets, integrating advanced machine learning models: Logistic Regression, Decision Trees, Random Forests, XGBoost, and CatBoost. These have been evaluated in terms of scalability, accuracy, and handling imbalanced datasets. Key findings: Among the most promising models for complex and imbalanced data, XGBoost and CatBoost promise close-to-ideal accuracy rates in fraudulent transaction detection. PySpark will be instrumental in scaling these systems to enable them to perform distributed processing, real-time analysis, and adaptive learning. This study further discusses challenges like overfitting, data access, and real-time implementation with potential solutions such as ensemble methods, intelligent sampling, and graph-based approaches. Future directions are underlined by deploying these frameworks in live transaction environments, leveraging continuous learning mechanisms, and integrating advanced anomaly detection techniques to handle evolving fraud patterns. The present research demonstrates the importance of distributed machine learning frameworks for developing robust, scalable, and efficient fraud detection systems, considering their significant impact on financial security and the overall financial ecosystem.
ISSN:2079-9292
DOI:10.3390/electronics14091754
Fuente:Advanced Technologies & Aerospace Database