Generative AI-powered multilingual ASR for seamless language-mixing transcriptions

Guardado en:
Detalles Bibliográficos
Publicado en:Journal of Electrical Systems and Information Technology vol. 12, no. 1 (Dec 2025), p. 42
Autor principal: Dash, Puspita
Otros Autores: Babu, Sruthi, Singaravel, Logeswari, Balasubramanian, Devadarshini
Publicado:
Springer Nature B.V.
Materias:
Acceso en línea:Citation/Abstract
Full Text
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3230018464
003 UK-CbPIL
022 |a 2314-7172 
024 7 |a 10.1186/s43067-025-00204-1  |2 doi 
035 |a 3230018464 
045 2 |b d20251201  |b d20251231 
100 1 |a Dash, Puspita  |u Sri Manakula Vinayagar Engineering College, Department of Information Technology, Madagadipet, India 
245 1 |a Generative AI-powered multilingual ASR for seamless language-mixing transcriptions 
260 |b Springer Nature B.V.  |c Dec 2025 
513 |a Journal Article 
520 3 |a In a bilingual and linguistically diverse country like India, where a significant portion of the population is fluent in multiple languages, the conventional bilingual Transformer neural network architecture faces challenges in accurately translating conversations that seamlessly switch between different languages. In this paper, we propose a multilingual automatic speech recognition system that can understand all intra-sentential terms and transcribe human speech into written text in English or any other language without making any grammatical mistakes. As a result, this method of translating Tanglish to Tamil or English works well. It is finished with the help of generative AI. Here, we use a generative pre-trained transformer model, which learns to predict the subsequent word in a language during the pre-training stage in order to get an understanding of language structure and semantics. The algorithm used here is long short-term memory (LSTM) plays a crucial role in speech to text by capturing temporal dependencies maintaining context and generating accurate transcriptions from audio inputs. We experimented on 50 Tamil–English agriculturally based data and found that the generative pre-trained transformer model can achieve an 84.37% relative accuracy rate even for short sentences and 73.98% relative accuracy rate for lengthy sentences in bilingual automatic speech recognition (ASR) performance. 
653 |a Language 
653 |a Translating 
653 |a Machine learning 
653 |a Accuracy 
653 |a Semantics 
653 |a Neural networks 
653 |a Languages 
653 |a Voice recognition 
653 |a Generative artificial intelligence 
653 |a Natural language processing 
653 |a Multilingualism 
653 |a Linguistics 
653 |a Algorithms 
653 |a Audio data 
653 |a Automation 
653 |a Automatic speech recognition 
653 |a Bilingualism 
653 |a Speech 
653 |a English language 
653 |a Sentences 
700 1 |a Babu, Sruthi  |u Sri Manakula Vinayagar Engineering College, Department of Information Technology, Madagadipet, India 
700 1 |a Singaravel, Logeswari  |u Sri Manakula Vinayagar Engineering College, Department of Information Technology, Madagadipet, India 
700 1 |a Balasubramanian, Devadarshini  |u Sri Manakula Vinayagar Engineering College, Department of Information Technology, Madagadipet, India 
773 0 |t Journal of Electrical Systems and Information Technology  |g vol. 12, no. 1 (Dec 2025), p. 42 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3230018464/abstract/embedded/J7RWLIQ9I3C9JK51?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3230018464/fulltext/embedded/J7RWLIQ9I3C9JK51?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3230018464/fulltextPDF/embedded/J7RWLIQ9I3C9JK51?source=fedsrch