Intelligent text similarity assessment using Roberta with integrated chaotic perturbation optimization techniques

Guardado en:
Detalles Bibliográficos
Publicado en:Journal of Big Data vol. 12, no. 1 (Jul 2025), p. 164
Autor principal: Hassan, Esraa
Otros Autores: Talaat, Amira Samy, Elsabagh, M. A.
Publicado:
Springer Nature B.V.
Materias:
Acceso en línea:Citation/Abstract
Full Text
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3229357039
003 UK-CbPIL
022 |a 2196-1115 
024 7 |a 10.1186/s40537-025-01233-3  |2 doi 
035 |a 3229357039 
045 2 |b d20250701  |b d20250731 
100 1 |a Hassan, Esraa  |u Kafrelsheikh University, Department of Machine Learning and Information Retrieval, Faculty of Artificial Intelligence, Kafrelsheikh, Egypt (GRID:grid.411978.2) (ISNI:0000 0004 0578 3577) 
245 1 |a Intelligent text similarity assessment using Roberta with integrated chaotic perturbation optimization techniques 
260 |b Springer Nature B.V.  |c Jul 2025 
513 |a Journal Article 
520 3 |a Precisely evaluating text similarity remains a fundamental challenge in Natural Language Processing (NLP), with widespread applications in plagiarism detection, information retrieval, semantic analysis, and recommendation systems. Traditional approaches often suffer from overfitting, local optima stagnation, and difficulty capturing deep semantic relationships. To address these challenges, this paper introduces an Intelligent Text Similarity Assessment Model that integrates Robustly Optimized Bidirectional Encoder Representations from Transformers (RoBERTa) with Chaotic Sand Cat Swarm Optimization (CHSCSO), a novel swarm intelligence-based optimization method inspired by chaotic dynamics. The model leverages RoBERTa’s robust contextual embeddings to extract deep semantic representations while utilizing CHSCSO’s controlled chaotic perturbations to optimize hyperparameters dynamically. This integration enhances model generalization, mitigates overfitting, and improves the trade-off between exploration and exploitation during training. CHSCSO refines the parameter search space by employing chaotic maps, ensuring a more adaptive and efficient training process. Extensive experiments on multiple benchmark datasets, including Semantic Textual Similarity (STS) and Textual Entailment (TE), demonstrate the model’s superiority over standard RoBERTa fine-tuning and conventional baselines that reach cosine similarity scores that are clustered at 0.996. The optimized model achieves higher accuracy and improved stability and exhibits faster convergence in text similarity tasks. 
653 |a Language 
653 |a Similarity 
653 |a Dictionaries 
653 |a Accuracy 
653 |a Semantics 
653 |a Swarm intelligence 
653 |a Deep learning 
653 |a Datasets 
653 |a Recommender systems 
653 |a Sentiment analysis 
653 |a Information retrieval 
653 |a Optimization techniques 
653 |a Perturbation 
653 |a Optimization 
653 |a Methods 
653 |a Natural language processing 
653 |a Multilingualism 
653 |a Dialects 
653 |a Representations 
653 |a Efficiency 
653 |a Big Data 
653 |a Plagiarism 
653 |a Experiments 
653 |a Exploitation 
653 |a Training 
653 |a Data mining 
653 |a Stagnation 
653 |a Retrieval 
653 |a Convergence 
653 |a Semantic analysis 
653 |a Bidirectionality 
653 |a Entailment 
653 |a Intelligence 
700 1 |a Talaat, Amira Samy  |u Electronics Research Institute, Computers and Systems Department, Cairo, Egypt (GRID:grid.463242.5) (ISNI:0000 0004 0387 2680) 
700 1 |a Elsabagh, M. A.  |u Kafrelsheikh University, Department of Machine Learning and Information Retrieval, Faculty of Artificial Intelligence, Kafrelsheikh, Egypt (GRID:grid.411978.2) (ISNI:0000 0004 0578 3577) 
773 0 |t Journal of Big Data  |g vol. 12, no. 1 (Jul 2025), p. 164 
786 0 |d ProQuest  |t ABI/INFORM Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3229357039/abstract/embedded/75I98GEZK8WCJMPQ?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3229357039/fulltext/embedded/75I98GEZK8WCJMPQ?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3229357039/fulltextPDF/embedded/75I98GEZK8WCJMPQ?source=fedsrch