The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World

Guardado en:
Detalles Bibliográficos
Publicado en:Entropy vol. 27, no. 10 (2025), p. 1039-1051
Autor principal: Ryabko Boris
Otros Autores: Savina Nadezhda, Getachew, Lulu Yeshewas, Han, Yunfei
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3265896178
003 UK-CbPIL
022 |a 1099-4300 
024 7 |a 10.3390/e27101039  |2 doi 
035 |a 3265896178 
045 2 |b d20250101  |b d20251231 
084 |a 231460  |2 nlm 
100 1 |a Ryabko Boris  |u Federal Research Center for Information and Computational Technologies, 6300090 Novosibirsk, Russia 
245 1 |a The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a In this paper, we apply an information-theoretic method proposed by Ryabko and Savina (therefore called the RS-method), based on the use of data compression, to recognize the individual author’s style of a writer across four languages from different language groups and families. In this paper, the presented method was used to study fiction texts in Russian (East Slavic group of languages of the Indo-European language family), Amharic (South Ethiosemitic group of the Semitic language family), Chinese (Sinitic group of the Sino-Tibetan language family) and English (West Germanic language group of the Indo-European language family). It was found that the amount of data necessary for recognizing an author’s style is almost the same for all four languages, i.e., the amount of data is invariant across different language groups. The results obtained are of interest to computer science, literary studies, linguistics and, in particular, computational linguistics. 
653 |a Linguistics 
653 |a Writers 
653 |a Sino-Tibetan languages 
653 |a Hypothesis testing 
653 |a Data compression 
653 |a Russian language 
653 |a Information theory 
653 |a Indo-European languages 
653 |a Languages 
653 |a West Germanic languages 
653 |a Computational linguistics 
653 |a Methods 
653 |a Fiction 
653 |a Semitic languages 
653 |a Computer science 
653 |a Compression 
653 |a Amharic 
653 |a Information sources 
653 |a Chinese languages 
653 |a Statistical analysis 
653 |a Families & family life 
653 |a Literary criticism 
653 |a English language 
653 |a Germanic languages 
653 |a Data 
653 |a Asian cultural groups 
653 |a Slavic cultural groups 
700 1 |a Savina Nadezhda  |u Department of Information Technologies, Novosibirsk State University, 6300090 Novosibirsk, Russiaj.lulu@g.nsu.ru (Y.G.L.); yunfei.han@mail.ru (Y.H.) 
700 1 |a Getachew, Lulu Yeshewas  |u Department of Information Technologies, Novosibirsk State University, 6300090 Novosibirsk, Russiaj.lulu@g.nsu.ru (Y.G.L.); yunfei.han@mail.ru (Y.H.) 
700 1 |a Han, Yunfei  |u Department of Information Technologies, Novosibirsk State University, 6300090 Novosibirsk, Russiaj.lulu@g.nsu.ru (Y.G.L.); yunfei.han@mail.ru (Y.H.) 
773 0 |t Entropy  |g vol. 27, no. 10 (2025), p. 1039-1051 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3265896178/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3265896178/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3265896178/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch