Do LLMs Really Understand SQL?

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Rahaman, Ananya
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3235004947
003 UK-CbPIL
020 |a 9798290628127 
035 |a 3235004947 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Rahaman, Ananya 
245 1 |a Do LLMs Really Understand SQL? 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a The rise of large language models (LLMs) has significantly influenced various fields, including natural language processing (NLP) and image generation, by making complex computational tasks more accessible. Despite their remarkable generative capabilities, a fundamental ques- tion remains regarding their level of understanding,particularly in structured domains such as SQL, where precise logic and syntactic accuracy are essential. This work evaluates the extent to which LLMs comprehend SQL by assessing their performance on key tasks, includ- ing syntax error detection, missing token identification, query performance prediction, query equivalence checking, and query explanation. These tasks collectively examine the models’ ability to recognize patterns, maintain context awareness, interpret semantics, and ensure logical coherence—capabilities that are critical for genuine SQL understanding.To enable a rigorous evaluation, we construct labeled datasets from well-established SQL workloads and conduct extensive experiments on state-of-the-art LLMs. Our analysis specif- ically investigates how query complexity and distinct syntactic features impact model perfor- mance. The results indicate that while models such as GPT4 excel in tasks that rely on pattern recognition and contextual awareness, they exhibit persistent difficulties in deeper semantic understanding and logical consistency. These challenges are particularly evident in tasks such as accurately predicting query performance and verifying query equivalence.This gap suggests that current LLMs, despite their syntactic and structural proficiency, lack the ability to integrate deeper semantic reasoning required for comprehensive SQL comprehension. Our findings underscore the need for future advancements in LLMs to focus on im- proving their reasoning abilities and their capacity to incorporate domain-specific knowledge. Enhancing these aspects would enable a transition from syntactic fluency to a more logic- driven understanding, thereby unlocking the full potential of SQL in various computational applications. 
653 |a Language 
653 |a Failure 
653 |a User experience 
653 |a Syntax 
653 |a Human performance 
653 |a Error correction & detection 
653 |a Recommender systems 
653 |a Benchmarks 
653 |a Data processing 
653 |a Query formulation 
653 |a Natural language processing 
653 |a Workloads 
653 |a Keywords 
653 |a Large language models 
653 |a Cognition & reasoning 
653 |a Semantics 
653 |a Skills 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3235004947/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3235004947/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u https://ir.lib.uwo.ca/etd/10783