Behavioural Analysis of Malware by Selecting Influential API Through TF-IDF API Embeddings

Guardado en:
Detalles Bibliográficos
Publicado en:International Journal of Advanced Computer Science and Applications vol. 16, no. 5 (2025)
Autor principal: PDF
Publicado:
Science and Information (SAI) Organization Limited
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3222641054
003 UK-CbPIL
022 |a 2158-107X 
022 |a 2156-5570 
024 7 |a 10.14569/IJACSA.2025.0160575  |2 doi 
035 |a 3222641054 
045 2 |b d20250101  |b d20251231 
100 1 |a PDF 
245 1 |a Behavioural Analysis of Malware by Selecting Influential API Through TF-IDF API Embeddings 
260 |b Science and Information (SAI) Organization Limited  |c 2025 
513 |a Journal Article 
520 3 |a The constant threat of malware makes studying its behavior an ongoing task. Malware identification and clas-sification challenges can be solved better by analyzing software behaviorally rather than using conventional hashcode-based signatures. API sequence represents the behavior of any program when collected during its execution. Considering API sequences gathered while the malware was being executed in controlled conditions, this report addresses the issue of choosing influential APIs for malware. The suggested feature selection method Select API in this research selects key features, i.e., significant APIs, that can better classify malware using TF-IDF API embeddings. Two machine learning models, Random Forest, which ensemble several estimators implicitly, and Support Vector Classifier, a standard non-linear model, are trained and evaluated to validate the importance of the chosen APIs. The proposed API selection methodology, called SelectAPI, has shown promising results. It achieves accuracy, macro-avg precision-score, macro-avg recall-score, and macro-avg F1-score of 0.76, 0.77, 0.76, and 0.76, respectively. This method focuses on selecting influential APIs and has resulted in significantly improved performance on the open-benchmark multiclass dynamic-API-Sequence based malware dataset, MAL-API-2019. These results surpass the previously best-known accuracy value of 0.60 and reported F1-Score of 0.61. 
653 |a Accuracy 
653 |a Machine learning 
653 |a Malware 
653 |a Feature selection 
653 |a Software 
653 |a Vocational education 
653 |a Methods 
653 |a Datasets 
653 |a Computer science 
653 |a Classification 
773 0 |t International Journal of Advanced Computer Science and Applications  |g vol. 16, no. 5 (2025) 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3222641054/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3222641054/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch