Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge

Guardat en:
Dades bibliogràfiques
Publicat a:Behavior Research Methods (Online) vol. 57, no. 1 (Jan 2025), p. 28
Autor principal: Brysbaert, Marc
Altres autors: Martínez, Gonzalo, Reviriego, Pedro
Publicat:
Springer Nature B.V.
Matèries:
Accés en línia:Citation/Abstract
Full Text
Full Text - PDF
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3287487415
003 UK-CbPIL
022 |a 1554-3528 
024 7 |a 10.3758/s13428-024-02561-7  |2 doi 
035 |a 3287487415 
045 2 |b d20250101  |b d20250131 
084 |a 162337  |2 nlm 
100 1 |a Brysbaert, Marc  |u Ghent University, Department of Experimental Psychology, Ghent, Belgium (GRID:grid.5342.0) (ISNI:0000 0001 2069 7798) 
245 1 |a Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge 
260 |b Springer Nature B.V.  |c Jan 2025 
513 |a Journal Article 
520 3 |a This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures. We then applied LLM estimates to MWEs, also finding their effectiveness in measuring familiarity for these expressions. We have created a list of more than 400,000 English words and MWEs with LLM-generated familiarity estimates, which we hope will be a valuable resource for researchers. There is also a cleaned-up list of nearly 150,000 entries, excluding lesser-known stimuli, to streamline stimulus selection. Our findings highlight the advantages of LLM-based familiarity estimates, including their better performance than traditional word frequency measures (particularly for predicting word recognition accuracy), their ability to generalize to MWEs, availability for large lists of words, and ease of obtaining new estimates for all types of stimuli. 
653 |a Language 
653 |a Naming 
653 |a Students 
653 |a Stimuli 
653 |a Familiarity 
653 |a Word processing 
653 |a Word recognition 
653 |a Pattern recognition 
653 |a Stimulus 
653 |a Counting 
653 |a Word frequency 
653 |a Ratings & rankings 
653 |a Language modeling 
653 |a Large language models 
653 |a Efficiency 
653 |a Semantics 
700 1 |a Martínez, Gonzalo  |u Universidad Carlos III de Madrid, Leganés, Spain (GRID:grid.7840.b) (ISNI:0000 0001 2168 9183) 
700 1 |a Reviriego, Pedro  |u ETSI de Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain (GRID:grid.5690.a) (ISNI:0000 0001 2151 2978) 
773 0 |t Behavior Research Methods (Online)  |g vol. 57, no. 1 (Jan 2025), p. 28 
786 0 |d ProQuest  |t Health & Medical Collection 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3287487415/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3287487415/fulltext/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3287487415/fulltextPDF/embedded/H09TXR3UUZB2ISDL?source=fedsrch