Quantifying cross-language code reuse via function-level clone detection

Сохранить в:

Библиографические подробности
Опубликовано в::	Journal of King Saud University. Computer and Information Sciences vol. 37, no. 10 (Dec 2025), p. 327
Главный автор:	Rong, Yi
Другие авторы:	Zhou, Yan
Опубликовано:	Springer Nature B.V.
Предметы:	Language Maintainability Software Datasets Quality assessment Java Deep learning Neural networks Syntax Artificial neural networks Cloning Programming languages Sensors Ablation Python Plagiarism Software reuse Code reuse Software development Semantics
Online-ссылка:	Citation/Abstract Full Text Full Text - PDF
Метки:	Добавить метку Нет меток, Требуется 1-ая метка записи!

MARC


LEADER	00000nab a2200000uu 4500
001	3274025682
003	UK-CbPIL
022			\|a 1319-1578
024	7		\|a 10.1007/s44443-025-00362-2 \|2 doi
035			\|a 3274025682
045	2		\|b d20251201 \|b d20251231
100	1		\|a Rong, Yi \|u The University of New South Wales, School of Education, New South Wales, Australia (GRID:grid.1005.4) (ISNI:0000 0004 4902 0432)
245	1		\|a Quantifying cross-language code reuse via function-level clone detection
260			\|b Springer Nature B.V. \|c Dec 2025
513			\|a Journal Article
520	3		\|a Code reuse through cloning is common in software development, yet excessive or unchecked cloning can harm maintainability and raise plagiarism concerns. Detecting the proportion of reused (cloned) code in a software project, especially across different programming languages, is a challenging task. This paper defines code reuse proportion detection as measuring how much code in a target program is cloned (identical or similar) from elsewhere. Existing code clone detection techniques perform well in single-language settings but struggle with cross-language clones and do not directly quantify reuse proportion. To address these gaps, we propose a novel cross-language function-level code clone detection approach using a dual embedding Siamese neural network. Our method represents code in Java and Python using a unified abstract syntax structure and semantic embeddings, then uses a Siamese deep network to learn language-agnostic similarities. We also introduce a metric to quantify the clone-based reuse ratio for each function or program. Experiments on three public datasets (including a Java clone benchmark, a Python code clone corpus, and a cross-language Java–Python clone dataset) show that our approach outperforms ten baseline methods, including state-of-the-art and classical clone detectors. Ablation studies confirm the contribution of each component (structural embeddings, cross-language alignment, and contrastive learning) to performance gains. Our model achieves new state-of-the-art accuracy in code clone detection, enabling precise measurement of code reuse. These results demonstrate that the proposed approach can effectively detect cross-language code clones and quantify reuse proportion, benefiting software plagiarism detection and code quality assessment in multi-language projects.
653			\|a Language
653			\|a Maintainability
653			\|a Software
653			\|a Datasets
653			\|a Quality assessment
653			\|a Java
653			\|a Deep learning
653			\|a Neural networks
653			\|a Syntax
653			\|a Artificial neural networks
653			\|a Cloning
653			\|a Programming languages
653			\|a Sensors
653			\|a Ablation
653			\|a Python
653			\|a Plagiarism
653			\|a Software reuse
653			\|a Code reuse
653			\|a Software development
653			\|a Semantics
700	1		\|a Zhou, Yan \|u South China Agricultural University, College of Mathematics and Informatics, Guangdong, China (GRID:grid.20561.30) (ISNI:0000 0000 9546 5767)
773	0		\|t Journal of King Saud University. Computer and Information Sciences \|g vol. 37, no. 10 (Dec 2025), p. 327
786	0		\|d ProQuest \|t Computer Science Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3274025682/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text \|u https://www.proquest.com/docview/3274025682/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3274025682/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch