A Machine Learning-Based Approach For Detecting Malicious PyPI Packages

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org (Dec 6, 2024), p. n/a
1. Verfasser: Samaana, Haya
Weitere Verfasser: Diego Elias Costa, Shihab, Emad, Abdellatif, Ahmad
Veröffentlicht:
Cornell University Library, arXiv.org
Schlagworte:
Online-Zugang:Citation/Abstract
Full text outside of ProQuest
Tags: Tag hinzufügen
Keine Tags, Fügen Sie das erste Tag hinzu!

MARC

LEADER 00000nab a2200000uu 4500
001 3142374147
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3142374147 
045 0 |b d20241206 
100 1 |a Samaana, Haya 
245 1 |a A Machine Learning-Based Approach For Detecting Malicious PyPI Packages 
260 |b Cornell University Library, arXiv.org  |c Dec 6, 2024 
513 |a Working Paper 
520 3 |a Background. In modern software development, the use of external libraries and packages is increasingly prevalent, streamlining the software development process and enabling developers to deploy feature-rich systems with little coding. While this reliance on reusing code offers substantial benefits, it also introduces serious risks for deployed software in the form of malicious packages - harmful and vulnerable code disguised as useful libraries. Aims. Popular ecosystems, such PyPI, receive thousands of new package contributions every week, and distinguishing safe contributions from harmful ones presents a significant challenge. There is a dire need for reliable methods to detect and address the presence of malicious packages in these environments. Method. To address these challenges, we propose a data-driven approach that uses machine learning and static analysis to examine the package's metadata, code, files, and textual characteristics to identify malicious packages. Results. In evaluations conducted within the PyPI ecosystem, we achieved an F1-measure of 0.94 for identifying malicious packages using a stacking ensemble classifier. Conclusions. This tool can be seamlessly integrated into package vetting pipelines and has the capability to flag entire packages, not just malicious function calls. This enhancement strengthens security measures and reduces the manual workload for developers and registry maintainers, thereby contributing to the overall integrity of the ecosystem. 
653 |a Machine learning 
653 |a Packages 
653 |a Software development 
653 |a Static code analysis 
653 |a Ensemble learning 
653 |a Computer programming 
700 1 |a Diego Elias Costa 
700 1 |a Shihab, Emad 
700 1 |a Abdellatif, Ahmad 
773 0 |t arXiv.org  |g (Dec 6, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3142374147/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.05259