Quantifying cross-language code reuse via function-level clone detection

Сохранить в:
Библиографические подробности
Опубликовано в::Journal of King Saud University. Computer and Information Sciences vol. 37, no. 10 (Dec 2025), p. 327
Главный автор: Rong, Yi
Другие авторы: Zhou, Yan
Опубликовано:
Springer Nature B.V.
Предметы:
Online-ссылка:Citation/Abstract
Full Text
Full Text - PDF
Метки: Добавить метку
Нет меток, Требуется 1-ая метка записи!

MARC

LEADER 00000nab a2200000uu 4500
001 3274025682
003 UK-CbPIL
022 |a 1319-1578 
024 7 |a 10.1007/s44443-025-00362-2  |2 doi 
035 |a 3274025682 
045 2 |b d20251201  |b d20251231 
100 1 |a Rong, Yi  |u The University of New South Wales, School of Education, New South Wales, Australia (GRID:grid.1005.4) (ISNI:0000 0004 4902 0432) 
245 1 |a Quantifying cross-language code reuse via function-level clone detection 
260 |b Springer Nature B.V.  |c Dec 2025 
513 |a Journal Article 
520 3 |a Code reuse through cloning is common in software development, yet excessive or unchecked cloning can harm maintainability and raise plagiarism concerns. Detecting the proportion of reused (cloned) code in a software project, especially across different programming languages, is a challenging task. This paper defines code reuse proportion detection as measuring how much code in a target program is cloned (identical or similar) from elsewhere. Existing code clone detection techniques perform well in single-language settings but struggle with cross-language clones and do not directly quantify reuse proportion. To address these gaps, we propose a novel cross-language function-level code clone detection approach using a dual embedding Siamese neural network. Our method represents code in Java and Python using a unified abstract syntax structure and semantic embeddings, then uses a Siamese deep network to learn language-agnostic similarities. We also introduce a metric to quantify the clone-based reuse ratio for each function or program. Experiments on three public datasets (including a Java clone benchmark, a Python code clone corpus, and a cross-language Java–Python clone dataset) show that our approach outperforms ten baseline methods, including state-of-the-art and classical clone detectors. Ablation studies confirm the contribution of each component (structural embeddings, cross-language alignment, and contrastive learning) to performance gains. Our model achieves new state-of-the-art accuracy in code clone detection, enabling precise measurement of code reuse. These results demonstrate that the proposed approach can effectively detect cross-language code clones and quantify reuse proportion, benefiting software plagiarism detection and code quality assessment in multi-language projects. 
653 |a Language 
653 |a Maintainability 
653 |a Software 
653 |a Datasets 
653 |a Quality assessment 
653 |a Java 
653 |a Deep learning 
653 |a Neural networks 
653 |a Syntax 
653 |a Artificial neural networks 
653 |a Cloning 
653 |a Programming languages 
653 |a Sensors 
653 |a Ablation 
653 |a Python 
653 |a Plagiarism 
653 |a Software reuse 
653 |a Code reuse 
653 |a Software development 
653 |a Semantics 
700 1 |a Zhou, Yan  |u South China Agricultural University, College of Mathematics and Informatics, Guangdong, China (GRID:grid.20561.30) (ISNI:0000 0000 9546 5767) 
773 0 |t Journal of King Saud University. Computer and Information Sciences  |g vol. 37, no. 10 (Dec 2025), p. 327 
786 0 |d ProQuest  |t Computer Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3274025682/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3274025682/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3274025682/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch