Parallel hierarchical encoding of linguistic representations in the human auditory cortex and recurrent automatic speech recognition systems

Bewaard in:

Bibliografische gegevens
Gepubliceerd in:	bioRxiv (Feb 1, 2025)
Hoofdauteur:	Keshishian, Menoua
Andere auteurs:	Mischler, Gavin, Thomas, Samuel, Kingsbury, Brian, Bickel, Stephan, Mehta, Ashesh D, Mesgarani, Nima
Gepubliceerd in:	Cold Spring Harbor Laboratory Press
Onderwerpen:	Signal processing Speech Cortex (auditory) Linguistics Information processing Speech recognition Controlled conditions Acoustics Voice recognition Semantics
Online toegang:	Citation/Abstract Full text outside of ProQuest
Tags:	Voeg label toe Geen labels, Wees de eerste die dit record labelt!

MARC


LEADER	00000nab a2200000uu 4500
001	3162417189
003	UK-CbPIL
022			\|a 2692-8205
024	7		\|a 10.1101/2025.01.30.635775 \|2 doi
035			\|a 3162417189
045	0		\|b d20250201
100	1		\|a Keshishian, Menoua
245	1		\|a Parallel hierarchical encoding of linguistic representations in the human auditory cortex and recurrent automatic speech recognition systems
260			\|b Cold Spring Harbor Laboratory Press \|c Feb 1, 2025
513			\|a Working Paper
520	3		\|a The human brain's ability to transform acoustic speech signals into rich linguistic representations has inspired advancements in automatic speech recognition (ASR) systems. While ASR systems now achieve human-level performance under controlled conditions, prior research on their parallels with the brain has been limited by the use of biologically implausible models, narrow feature sets, and comparisons that primarily emphasize predictability of brain activity without fully exploring shared underlying representations. Additionally, studies comparing the brain to text-based language models overlook the acoustic stages of speech processing, an essential part in transforming sound to meaning. Leveraging high-resolution intracranial recordings and a recurrent ASR model, this study bridges these gaps by uncovering a striking correspondence in the hierarchical encoding of linguistic features, from low-level acoustic signals to high-level semantic processing. Specifically, we demonstrate that neural activity in distinct regions of the auditory cortex aligns with representations in corresponding layers of the ASR model and, crucially, that both systems encode similar features at each stage of processing - from acoustic to phonetic, lexical, and semantic information. These findings suggest that both systems, despite their distinct architectures, converge on similar strategies for language processing, providing insight in the optimal computational principles underlying linguistic representation and the shared constraints shaping human and artificial speech processing.Competing Interest StatementThe authors have declared no competing interest.
653			\|a Signal processing
653			\|a Speech
653			\|a Cortex (auditory)
653			\|a Linguistics
653			\|a Information processing
653			\|a Speech recognition
653			\|a Controlled conditions
653			\|a Acoustics
653			\|a Voice recognition
653			\|a Semantics
700	1		\|a Mischler, Gavin
700	1		\|a Thomas, Samuel
700	1		\|a Kingsbury, Brian
700	1		\|a Bickel, Stephan
700	1		\|a Mehta, Ashesh D
700	1		\|a Mesgarani, Nima
773	0		\|t bioRxiv \|g (Feb 1, 2025)
786	0		\|d ProQuest \|t Biological Science Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3162417189/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u https://www.biorxiv.org/content/10.1101/2025.01.30.635775v1