An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications
Guardat en:
| Publicat a: | arXiv.org (Apr 17, 2024), p. n/a |
|---|---|
| Autor principal: | |
| Altres autors: | , |
| Publicat: |
Cornell University Library, arXiv.org
|
| Matèries: | |
| Accés en línia: | Citation/Abstract Full text outside of ProQuest |
| Etiquetes: |
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
|
| Resum: | Automatic Program Repair (APR) has garnered significant attention as a practical research domain focused on automatically fixing bugs in programs. While existing APR techniques primarily target imperative programming languages like C and Java, there is a growing need for effective solutions applicable to declarative software specification languages. This paper presents a systematic investigation into the capacity of Large Language Models (LLMs) for repairing declarative specifications in Alloy, a declarative formal language used for software specification. We propose a novel repair pipeline that integrates a dual-agent LLM framework, comprising a Repair Agent and a Prompt Agent. Through extensive empirical evaluation, we compare the effectiveness of LLM-based repair with state-of-the-art Alloy APR techniques on a comprehensive set of benchmarks. Our study reveals that LLMs, particularly GPT-4 variants, outperform existing techniques in terms of repair efficacy, albeit with a marginal increase in runtime and token usage. This research contributes to advancing the field of automatic repair for declarative specifications and highlights the promising potential of LLMs in this domain. |
|---|---|
| ISSN: | 2331-8422 |
| Font: | Engineering Database |