Leveraging Large Language Models for Intelligent Construction Management Systems

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Zhong, Yunshun
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:The Construction Management Systems (CMS) domain increasingly depends on unstructured text (inspection reports, technical documents, incident logs), creating an opportunity for domain-aware Natural Language Processing (NLP) with Large Language Models (LLMs). Yet general-domain pre-training often misses domain-specific terminology and context, limiting precision and accuracy for tasks in CMS domain. This gap motivates domain-specific LLMs along two tracks: (i) a discriminative, encoder-based Transformer system for classification and regression on agency documents to support risk assessment, resource allocation, and cost estimation (Chapter 2); and (ii) a generative, decoder-based question-answering system that grounds answers in project documents while leveraging model priors (Chapter 3). The retrieval-augmented generation (RAG) and prompt engineering pipeline in Chapter 3 are adapted for various data-mining and analysis tasks across the domain as shown in Chapters 4–5. Chapter 2 develops the first dedicated CMS corpus and an end-to-end pipeline for pre-training language models on domain text. After domain-specific pre-training and fine-tuning, these models outperform general models on two representative tasks, structural condition assessment and building compliance checking, with F1 improvements of 5.9% and 8.5%, respectively, underscoring the value of domain-specific pre-training. Chapter 3 builds an agency-specific project-authoring advisor using RAG with prompt engineering (persona, format template, chain-of-thought, few-shots learning). During evaluation grounded in agency documentation, GPT-4 with RAG and optimized prompts scores 88.9/100, versus 75.7 with RAG only and 53.4 without RAG, and significantly surpasses conventional search methods. Chapters 4 releases a public, metadata-rich dataset of 1,100 CMS publications with annual citations, then compares topic extraction via Latent Dirichlet Allocation (LDA) and an automated LLM-RAG pipeline evaluated against expert-labeled topics. The LLM-RAG approach achieves far higher agreement (85.94× Jaccard, 8.14× BLEU, 32.11× ROUGE) and reveals research trends by analyzing topics and citations. Chapter 5 adapts the “Sleeping Beauty” framework to perform the first systematic analysis of papers with delayed recognition in CMS, showing delayed recognition is more prevalent than assumed and cautioning against short-horizon citation metrics. Collectively, this thesis demonstrates that combining domain-specific datasets and pre-training with RAG and specialized prompt engineering delivers accurate, auditable decision support, advancing evidence-based planning, regulatory compliance, and operational efficiency across the CMS sector.
ISBN:9798265439529
Fuente:ProQuest Dissertations & Theses Global