Generative AI-powered multilingual ASR for seamless language-mixing transcriptions

Guardado en:

Detalles Bibliográficos
Publicado en:	Journal of Electrical Systems and Information Technology vol. 12, no. 1 (Dec 2025), p. 42
Autor principal:	Dash, Puspita
Otros Autores:	Babu, Sruthi, Singaravel, Logeswari, Balasubramanian, Devadarshini
Publicado:	Springer Nature B.V.
Materias:	Language Translating Machine learning Accuracy Semantics Neural networks Languages Voice recognition Generative artificial intelligence Natural language processing Multilingualism Linguistics Algorithms Audio data Automation Automatic speech recognition Bilingualism Speech English language Sentences
Acceso en línea:	Citation/Abstract Full Text Full Text - PDF
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Descripción
Resumen:	In a bilingual and linguistically diverse country like India, where a significant portion of the population is fluent in multiple languages, the conventional bilingual Transformer neural network architecture faces challenges in accurately translating conversations that seamlessly switch between different languages. In this paper, we propose a multilingual automatic speech recognition system that can understand all intra-sentential terms and transcribe human speech into written text in English or any other language without making any grammatical mistakes. As a result, this method of translating Tanglish to Tamil or English works well. It is finished with the help of generative AI. Here, we use a generative pre-trained transformer model, which learns to predict the subsequent word in a language during the pre-training stage in order to get an understanding of language structure and semantics. The algorithm used here is long short-term memory (LSTM) plays a crucial role in speech to text by capturing temporal dependencies maintaining context and generating accurate transcriptions from audio inputs. We experimented on 50 Tamil–English agriculturally based data and found that the generative pre-trained transformer model can achieve an 84.37% relative accuracy rate even for short sentences and 73.98% relative accuracy rate for lengthy sentences in bilingual automatic speech recognition (ASR) performance.
ISSN:	2314-7172
DOI:	10.1186/s43067-025-00204-1
Fuente:	Engineering Database