Modeling Language as Social and Cultural Data

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Li, Lucy
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3232734592
003 UK-CbPIL
020 |a 9798288863738 
035 |a 3232734592 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Li, Lucy 
245 1 |a Modeling Language as Social and Cultural Data 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Language shows up everywhere. It's in the digital content we circulate online, and it's in our conversations with each other. It's also in the training data and generations of language models, which are increasingly integrated into our everyday lives. Language is powerful because it embeds social identities and beliefs: it expresses who we are, and shapes our understanding of each other. Thus, language is not only a window for understanding society, but also an instrument for defining it. The doctoral research presented in this thesis focuses on computational analyses of language and society. It covers several empirical studies of language, used to inform human-centered language model development and facilitate data-driven studies of people.This thesis is structured into two parts. In the first, I examine language by and for people. The work here incorporates a sociolinguistic perspective, emphasizing how the social identities of communities may relate to language differences. I map how language varies in communities at scale, and measure whose language is prioritized in the early stages of model development. In the second part, I examine language about people. There, I show how text analysis can characterize discussions and depictions of people. The studies in this part span the social dimensions of gender and race, and demonstrate how methods ranging from word representations to topic modeling can help make sense of socially significant language patterns. Altogether, this thesis ties together multiple ways in which people and language may intersect, and uses computational text analysis to benefit both social scientific inquiry and NLP methodology. 
653 |a Information science 
653 |a Computer science 
653 |a Web studies 
653 |a Sociolinguistics 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3232734592/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3232734592/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch