FDM-Bench: A Comprehensive Benchmark for Evaluating Large Language Models in Additive Manufacturing Tasks

Uloženo v:
Podrobná bibliografie
Vydáno v:arXiv.org (Dec 13, 2024), p. n/a
Hlavní autor: Eslaminia, Ahmadreza
Další autoři: Jackson, Adrian, Tian, Beitong, Stern, Avi, Gordon, Hallie, Malhotra, Rajiv, Nahrstedt, Klara, Shao, Chenhui
Vydáno:
Cornell University Library, arXiv.org
Témata:
On-line přístup:Citation/Abstract
Full text outside of ProQuest
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

MARC

LEADER 00000nab a2200000uu 4500
001 3145272609
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3145272609 
045 0 |b d20241213 
100 1 |a Eslaminia, Ahmadreza 
245 1 |a FDM-Bench: A Comprehensive Benchmark for Evaluating Large Language Models in Additive Manufacturing Tasks 
260 |b Cornell University Library, arXiv.org  |c Dec 13, 2024 
513 |a Working Paper 
520 3 |a Fused Deposition Modeling (FDM) is a widely used additive manufacturing (AM) technique valued for its flexibility and cost-efficiency, with applications in a variety of industries including healthcare and aerospace. Recent developments have made affordable FDM machines accessible and encouraged adoption among diverse users. However, the design, planning, and production process in FDM require specialized interdisciplinary knowledge. Managing the complex parameters and resolving print defects in FDM remain challenging. These technical complexities form the most critical barrier preventing individuals without technical backgrounds and even professional engineers without training in other domains from participating in AM design and manufacturing. Large Language Models (LLMs), with their advanced capabilities in text and code processing, offer the potential for addressing these challenges in FDM. However, existing research on LLM applications in this field is limited, typically focusing on specific use cases without providing comprehensive evaluations across multiple models and tasks. To this end, we introduce FDM-Bench, a benchmark dataset designed to evaluate LLMs on FDM-specific tasks. FDM-Bench enables a thorough assessment by including user queries across various experience levels and G-code samples that represent a range of anomalies. We evaluate two closed-source models (GPT-4o and Claude 3.5 Sonnet) and two open-source models (Llama-3.1-70B and Llama-3.1-405B) on FDM-Bench. A panel of FDM experts assess the models' responses to user queries in detail. Results indicate that closed-source models generally outperform open-source models in G-code anomaly detection, whereas Llama-3.1-405B demonstrates a slight advantage over other models in responding to user queries. These findings underscore FDM-Bench's potential as a foundational tool for advancing research on LLM capabilities in FDM. 
653 |a Knowledge management 
653 |a Fused deposition modeling 
653 |a Source code 
653 |a Anomalies 
653 |a Large language models 
653 |a Manufacturing 
653 |a Queries 
653 |a G codes 
653 |a Additive manufacturing 
653 |a Task complexity 
653 |a Benchmarks 
700 1 |a Jackson, Adrian 
700 1 |a Tian, Beitong 
700 1 |a Stern, Avi 
700 1 |a Gordon, Hallie 
700 1 |a Malhotra, Rajiv 
700 1 |a Nahrstedt, Klara 
700 1 |a Shao, Chenhui 
773 0 |t arXiv.org  |g (Dec 13, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3145272609/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.09819