Quantifying cross-language code reuse via function-level clone detection
Сохранить в:
| Опубликовано в:: | Journal of King Saud University. Computer and Information Sciences vol. 37, no. 10 (Dec 2025), p. 327 |
|---|---|
| Главный автор: | |
| Другие авторы: | |
| Опубликовано: |
Springer Nature B.V.
|
| Предметы: | |
| Online-ссылка: | Citation/Abstract Full Text Full Text - PDF |
| Метки: |
Нет меток, Требуется 1-ая метка записи!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3274025682 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 1319-1578 | ||
| 024 | 7 | |a 10.1007/s44443-025-00362-2 |2 doi | |
| 035 | |a 3274025682 | ||
| 045 | 2 | |b d20251201 |b d20251231 | |
| 100 | 1 | |a Rong, Yi |u The University of New South Wales, School of Education, New South Wales, Australia (GRID:grid.1005.4) (ISNI:0000 0004 4902 0432) | |
| 245 | 1 | |a Quantifying cross-language code reuse via function-level clone detection | |
| 260 | |b Springer Nature B.V. |c Dec 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a Code reuse through cloning is common in software development, yet excessive or unchecked cloning can harm maintainability and raise plagiarism concerns. Detecting the proportion of reused (cloned) code in a software project, especially across different programming languages, is a challenging task. This paper defines code reuse proportion detection as measuring how much code in a target program is cloned (identical or similar) from elsewhere. Existing code clone detection techniques perform well in single-language settings but struggle with cross-language clones and do not directly quantify reuse proportion. To address these gaps, we propose a novel cross-language function-level code clone detection approach using a dual embedding Siamese neural network. Our method represents code in Java and Python using a unified abstract syntax structure and semantic embeddings, then uses a Siamese deep network to learn language-agnostic similarities. We also introduce a metric to quantify the clone-based reuse ratio for each function or program. Experiments on three public datasets (including a Java clone benchmark, a Python code clone corpus, and a cross-language Java–Python clone dataset) show that our approach outperforms ten baseline methods, including state-of-the-art and classical clone detectors. Ablation studies confirm the contribution of each component (structural embeddings, cross-language alignment, and contrastive learning) to performance gains. Our model achieves new state-of-the-art accuracy in code clone detection, enabling precise measurement of code reuse. These results demonstrate that the proposed approach can effectively detect cross-language code clones and quantify reuse proportion, benefiting software plagiarism detection and code quality assessment in multi-language projects. | |
| 653 | |a Language | ||
| 653 | |a Maintainability | ||
| 653 | |a Software | ||
| 653 | |a Datasets | ||
| 653 | |a Quality assessment | ||
| 653 | |a Java | ||
| 653 | |a Deep learning | ||
| 653 | |a Neural networks | ||
| 653 | |a Syntax | ||
| 653 | |a Artificial neural networks | ||
| 653 | |a Cloning | ||
| 653 | |a Programming languages | ||
| 653 | |a Sensors | ||
| 653 | |a Ablation | ||
| 653 | |a Python | ||
| 653 | |a Plagiarism | ||
| 653 | |a Software reuse | ||
| 653 | |a Code reuse | ||
| 653 | |a Software development | ||
| 653 | |a Semantics | ||
| 700 | 1 | |a Zhou, Yan |u South China Agricultural University, College of Mathematics and Informatics, Guangdong, China (GRID:grid.20561.30) (ISNI:0000 0000 9546 5767) | |
| 773 | 0 | |t Journal of King Saud University. Computer and Information Sciences |g vol. 37, no. 10 (Dec 2025), p. 327 | |
| 786 | 0 | |d ProQuest |t Computer Science Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3274025682/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text |u https://www.proquest.com/docview/3274025682/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3274025682/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |