Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost

Guardado en:
Detalles Bibliográficos
Publicado en:Electronics vol. 14, no. 9 (2025), p. 1754
Autor principal: Theodorakopoulos Leonidas
Otros Autores: Theodoropoulou Alexandra, Tsimakis Anastasios, Halkiopoulos Constantinos
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3203194285
003 UK-CbPIL
022 |a 2079-9292 
024 7 |a 10.3390/electronics14091754  |2 doi 
035 |a 3203194285 
045 2 |b d20250101  |b d20251231 
084 |a 231458  |2 nlm 
100 1 |a Theodorakopoulos Leonidas 
245 1 |a Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a This study presents an optimization for a distributed machine learning framework to achieve credit card fraud detection scalability. Due to the growth in fraudulent activities, this research implements the PySpark-based processing of large-scale transaction datasets, integrating advanced machine learning models: Logistic Regression, Decision Trees, Random Forests, XGBoost, and CatBoost. These have been evaluated in terms of scalability, accuracy, and handling imbalanced datasets. Key findings: Among the most promising models for complex and imbalanced data, XGBoost and CatBoost promise close-to-ideal accuracy rates in fraudulent transaction detection. PySpark will be instrumental in scaling these systems to enable them to perform distributed processing, real-time analysis, and adaptive learning. This study further discusses challenges like overfitting, data access, and real-time implementation with potential solutions such as ensemble methods, intelligent sampling, and graph-based approaches. Future directions are underlined by deploying these frameworks in live transaction environments, leveraging continuous learning mechanisms, and integrating advanced anomaly detection techniques to handle evolving fraud patterns. The present research demonstrates the importance of distributed machine learning frameworks for developing robust, scalable, and efficient fraud detection systems, considering their significant impact on financial security and the overall financial ecosystem. 
653 |a Big Data 
653 |a Machine learning 
653 |a Datasets 
653 |a Accuracy 
653 |a Adaptability 
653 |a Credit card fraud 
653 |a Fraud prevention 
653 |a Decision making 
653 |a Natural language processing 
653 |a Algorithms 
653 |a Anomalies 
653 |a Real time 
653 |a Ensemble learning 
653 |a Credit card processing 
653 |a Decision trees 
653 |a Distributed processing 
653 |a Financial institutions 
653 |a Adaptive learning 
700 1 |a Theodoropoulou Alexandra 
700 1 |a Tsimakis Anastasios 
700 1 |a Halkiopoulos Constantinos 
773 0 |t Electronics  |g vol. 14, no. 9 (2025), p. 1754 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3203194285/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3203194285/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3203194285/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch