Leveraging Large Language Models for Intelligent Construction Management Systems

Guardat en:
Dades bibliogràfiques
Publicat a:ProQuest Dissertations and Theses (2025)
Autor principal: Zhong, Yunshun
Publicat:
ProQuest Dissertations & Theses
Matèries:
Accés en línia:Citation/Abstract
Full Text - PDF
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3276241657
003 UK-CbPIL
020 |a 9798265439529 
035 |a 3276241657 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Zhong, Yunshun 
245 1 |a Leveraging Large Language Models for Intelligent Construction Management Systems 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a The Construction Management Systems (CMS) domain increasingly depends on unstructured text (inspection reports, technical documents, incident logs), creating an opportunity for domain-aware Natural Language Processing (NLP) with Large Language Models (LLMs). Yet general-domain pre-training often misses domain-specific terminology and context, limiting precision and accuracy for tasks in CMS domain. This gap motivates domain-specific LLMs along two tracks: (i) a discriminative, encoder-based Transformer system for classification and regression on agency documents to support risk assessment, resource allocation, and cost estimation (Chapter 2); and (ii) a generative, decoder-based question-answering system that grounds answers in project documents while leveraging model priors (Chapter 3). The retrieval-augmented generation (RAG) and prompt engineering pipeline in Chapter 3 are adapted for various data-mining and analysis tasks across the domain as shown in Chapters 4–5. Chapter 2 develops the first dedicated CMS corpus and an end-to-end pipeline for pre-training language models on domain text. After domain-specific pre-training and fine-tuning, these models outperform general models on two representative tasks, structural condition assessment and building compliance checking, with F1 improvements of 5.9% and 8.5%, respectively, underscoring the value of domain-specific pre-training. Chapter 3 builds an agency-specific project-authoring advisor using RAG with prompt engineering (persona, format template, chain-of-thought, few-shots learning). During evaluation grounded in agency documentation, GPT-4 with RAG and optimized prompts scores 88.9/100, versus 75.7 with RAG only and 53.4 without RAG, and significantly surpasses conventional search methods. Chapters 4 releases a public, metadata-rich dataset of 1,100 CMS publications with annual citations, then compares topic extraction via Latent Dirichlet Allocation (LDA) and an automated LLM-RAG pipeline evaluated against expert-labeled topics. The LLM-RAG approach achieves far higher agreement (85.94× Jaccard, 8.14× BLEU, 32.11× ROUGE) and reveals research trends by analyzing topics and citations. Chapter 5 adapts the “Sleeping Beauty” framework to perform the first systematic analysis of papers with delayed recognition in CMS, showing delayed recognition is more prevalent than assumed and cautioning against short-horizon citation metrics. Collectively, this thesis demonstrates that combining domain-specific datasets and pre-training with RAG and specialized prompt engineering delivers accurate, auditable decision support, advancing evidence-based planning, regulatory compliance, and operational efficiency across the CMS sector. 
653 |a Computer science 
653 |a Artificial intelligence 
653 |a Civil engineering 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3276241657/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3276241657/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch