Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures.
摘要:
Infective endocarditis (IE) is a serious, life-threatening condition requiring antibiotic prophylaxis for high-risk individuals undergoing invasive dental procedures. As LLMs are rapidly adopted by dental professionals for their efficiency and accessibility, assessing their accuracy in answering critical questions about antibiotic prophylaxis for IE prevention is crucial. Twenty-eight true/false questions based on the 2021 American Heart Association (AHA) guidelines for IE were posed to 7 popular LLMs. Each model underwent five independent runs per question using two prompt strategies: a pre-prompt as an experienced dentist and without a pre-prompt. Inter-model comparisons utilised the Kruskal-Wallis test, followed by post-hoc pairwise comparisons using Prism 10 software. Significant differences in accuracy were observed among the LLMs. All LLMs had a narrower confidence interval with a pre-prompt, and most, except Claude 3 Opus, showed improved performance. GPT-4o had the highest accuracy (80% with a pre-prompt, 78.57% without), followed by Gemini 1.5 Pro (78.57% and 77.86%) and Claude 3 Opus (75.71% and 77.14%). Gemini 1.5 Flash had the lowest accuracy (68.57% and 63.57%). Without a pre-prompt, Gemini 1.5 Flash's accuracy was significantly lower than Claude 3 Opus, Gemini 1.5 Pro, and GPT-4o. With a pre-prompt, Gemini 1.5 Flash and Claude 3.5 were significantly less accurate than Gemini 1.5 Pro and GPT-4o. None of the LLMs met the commonly used benchmark scores. All models provided both correct and incorrect answers randomly, except Claude 3.5 Sonnet with a pre-prompt, which consistently gave incorrect answers to eight questions across five runs. LLMs like GPT-4o show promise for retrieving AHA-IE guideline information, achieving up to 80% accuracy. However, complex medical questions may still pose a challenge. Pre-prompts offer a potential solution, and domain-specific training is essential for optimizing LLM performance in healthcare, especially with the emergence of models with increased token limits.
收起
展开
DOI:
10.1016/j.identj.2024.09.033
被引量:
年份:
1970


通过 文献互助 平台发起求助,成功后即可免费获取论文全文。
求助方法1:
知识发现用户
每天可免费求助50篇
求助方法1:
关注微信公众号
每天可免费求助2篇
求助方法2:
完成求助需要支付5财富值
您目前有 1000 财富值
相似文献(100)
参考文献(21)
引证文献(5)
来源期刊
影响因子:暂无数据
JCR分区: 暂无
中科院分区:暂无