Leveraging Large Language Models for Intelligent Construction Management Systems
I tiakina i:
| I whakaputaina i: | ProQuest Dissertations and Theses (2025) |
|---|---|
| Kaituhi matua: | |
| I whakaputaina: |
ProQuest Dissertations & Theses
|
| Ngā marau: | |
| Urunga tuihono: | Citation/Abstract Full Text - PDF |
| Ngā Tūtohu: |
Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!
|
| Whakarāpopotonga: | The Construction Management Systems (CMS) domain increasingly depends on unstructured text (inspection reports, technical documents, incident logs), creating an opportunity for domain-aware Natural Language Processing (NLP) with Large Language Models (LLMs). Yet general-domain pre-training often misses domain-specific terminology and context, limiting precision and accuracy for tasks in CMS domain. This gap motivates domain-specific LLMs along two tracks: (i) a discriminative, encoder-based Transformer system for classification and regression on agency documents to support risk assessment, resource allocation, and cost estimation (Chapter 2); and (ii) a generative, decoder-based question-answering system that grounds answers in project documents while leveraging model priors (Chapter 3). The retrieval-augmented generation (RAG) and prompt engineering pipeline in Chapter 3 are adapted for various data-mining and analysis tasks across the domain as shown in Chapters 4–5. Chapter 2 develops the first dedicated CMS corpus and an end-to-end pipeline for pre-training language models on domain text. After domain-specific pre-training and fine-tuning, these models outperform general models on two representative tasks, structural condition assessment and building compliance checking, with F1 improvements of 5.9% and 8.5%, respectively, underscoring the value of domain-specific pre-training. Chapter 3 builds an agency-specific project-authoring advisor using RAG with prompt engineering (persona, format template, chain-of-thought, few-shots learning). During evaluation grounded in agency documentation, GPT-4 with RAG and optimized prompts scores 88.9/100, versus 75.7 with RAG only and 53.4 without RAG, and significantly surpasses conventional search methods. Chapters 4 releases a public, metadata-rich dataset of 1,100 CMS publications with annual citations, then compares topic extraction via Latent Dirichlet Allocation (LDA) and an automated LLM-RAG pipeline evaluated against expert-labeled topics. The LLM-RAG approach achieves far higher agreement (85.94× Jaccard, 8.14× BLEU, 32.11× ROUGE) and reveals research trends by analyzing topics and citations. Chapter 5 adapts the “Sleeping Beauty” framework to perform the first systematic analysis of papers with delayed recognition in CMS, showing delayed recognition is more prevalent than assumed and cautioning against short-horizon citation metrics. Collectively, this thesis demonstrates that combining domain-specific datasets and pre-training with RAG and specialized prompt engineering delivers accurate, auditable decision support, advancing evidence-based planning, regulatory compliance, and operational efficiency across the CMS sector. |
|---|---|
| ISBN: | 9798265439529 |
| Puna: | ProQuest Dissertations & Theses Global |