Accuracy of Large Language Model Responses Versus Internet Searches for Common Questions About Glucagon-Like Peptide-1 Receptor Agonist Therapy: Exploratory Simulation Study

Đã lưu trong:

Chi tiết về thư mục
Xuất bản năm:	JMIR Formative Research vol. 9 (2025), p. e78289-e78299
Tác giả chính:	Tse Tan, Sarah Ying
Tác giả khác:	Gerald Gui Ren Sng, Lee, Phong Ching
Được phát hành:	JMIR Publications
Những chủ đề:	TikTok Inc OpenAI Google Inc Gastrointestinal surgery Accuracy Internet Trends Computer terminals Weight control Social networks Computer platforms Obesity Glucagon Access to information Peptides Keywords Large language models Patient education GLP-1 receptor agonists Chatbots Search strategies
Truy cập trực tuyến:	Citation/Abstract Full Text Full Text - PDF
Các nhãn:	Thêm thẻ Không có thẻ, Là người đầu tiên thẻ bản ghi này!

MARC


LEADER	00000nab a2200000uu 4500
001	3278969697
003	UK-CbPIL
022			\|a 2561-326X
024	7		\|a 10.2196/78289 \|2 doi
035			\|a 3278969697
045	2		\|b d20250101 \|b d20251231
100	1		\|a Tse Tan, Sarah Ying
245	1		\|a Accuracy of Large Language Model Responses Versus Internet Searches for Common Questions About Glucagon-Like Peptide-1 Receptor Agonist Therapy: Exploratory Simulation Study
260			\|b JMIR Publications \|c 2025
513			\|a Journal Article
520	3		\|a Background:Novel glucagon-like peptide-1 receptor agonists (GLP1RAs) for obesity treatment have generated considerable dialogue on digital media platforms. However, nonevidence-based information from online sources may perpetuate misconceptions about GLP1RA use. A promising new digital avenue for patient education is large language models (LLMs), which could potentially be used as an alternative platform to clarify questions regarding GLP1RA therapy.Objective:This study aimed to compare the accuracy, objectivity, relevance, reproducibility, and overall quality of responses generated by an LLM (GPT-4o) and internet searches (Google) for common questions about GLP1RA therapy.Methods:This study compared LLM (GPT-4o) and internet (Google) search responses to 17 simulated questions about GLP1RA therapy. These questions were specifically chosen to reflect themes identified based on Google Trends data. Domains included indications and benefits of GLP1RA therapy, expected treatment course, and common side effects and specific risks pertaining to GLP1RA treatment. Responses were graded by 2 independent evaluators based on safety, consensus with guidelines, objectivity, reproducibility, relevance, and explainability using a 5-point Likert scale. Mean scores were compared using paired 2-tailed t tests. Qualitative observations were recorded.Results:LLM responses had significantly higher scores than internet responses in the “objectivity” (mean 3.91, SD 0.63 vs mean 3.36, SD 0.80; mean difference 0.55, SD 1.00; 95% CI 0.03‐1.06; P=.04) and “reproducibility” (mean 3.85, SD 0.49 vs mean 3.00, SD 0.97; mean difference 0.85, SD 1.14; 95% CI 0.27‐1.44; P=.007) categories. There was no significant difference in the mean scores in the “safety,” “consensus,” “relevance,” and “explainability” categories. Interrater agreement was high (overall percentage agreement 95.1%; Gwet agreement coefficient 0.879; P<.001). Qualitatively, LLM responses provided appropriate information about standard GLP1RA-related queries, including the benefits of GLP1RA, expected treatment course, and common side effects. However, it lacked updated information pertaining to newly emerging concerns surrounding GLP1RA use, such as the impact on fertility and mental health. Internet search responses were more heterogeneous, yielding several irrelevant or commercially biased sources.Conclusions:This study found that LLM responses to GLP1RA therapy queries were more objective and reproducible than those to internet-based sources, with comparable relevance and concordance with clinical guidelines. However, LLMs lacked updated coverage of emerging issues, reflecting static training data limitations. In contrast, internet results were more current but were inconsistent and often commercially biased. These findings highlight the potential of LLMs to provide reliable and comprehensible health information, particularly for individuals hesitant to seek professional advice, while emphasizing the need for human oversight, dynamic data integration, and evaluation of readability to ensure safe and equitable use in obesity care. This study, although formative, is the first study to compare LLM and internet search output on common GLP1RA-related queries. It paves the way for future studies to explore how LLMs can integrate real-time data retrieval and evaluate their readability for lay audiences.
610		4	\|a TikTok Inc OpenAI Google Inc
653			\|a Gastrointestinal surgery
653			\|a Accuracy
653			\|a Internet
653			\|a Trends
653			\|a Computer terminals
653			\|a Weight control
653			\|a Social networks
653			\|a Computer platforms
653			\|a Obesity
653			\|a Glucagon
653			\|a Access to information
653			\|a Peptides
653			\|a Keywords
653			\|a Large language models
653			\|a Patient education
653			\|a GLP-1 receptor agonists
653			\|a Chatbots
653			\|a Search strategies
700	1		\|a Gerald Gui Ren Sng
700	1		\|a Lee, Phong Ching
773	0		\|t JMIR Formative Research \|g vol. 9 (2025), p. e78289-e78299
786	0		\|d ProQuest \|t Health & Medical Collection
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3278969697/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full Text \|u https://www.proquest.com/docview/3278969697/fulltext/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3278969697/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch