ChatGPT's Performance in Cardiac Arrest and Bradycardia Simulations Using the American Heart Association's Advanced Cardiovascular Life Support Guidelines: Exploratory Study.-Z研学术

ChatGPT's Performance in Cardiac Arrest and Bradycardia Simulations Using the American Heart Association's Advanced Cardiovascular Life Support Guidelines: Exploratory Study.

来自 PUBMED

作者：

Pham C ， Govender R ， Tehami S ， Chavez S ， Adepoju OE ， Liaw W

展开 

摘要：

ChatGPT is the most advanced large language model to date, with prior iterations having passed medical licensing examinations, providing clinical decision support, and improved diagnostics. Although limited, past studies of ChatGPT's performance found that artificial intelligence could pass the American Heart Association's advanced cardiovascular life support (ACLS) examinations with modifications. ChatGPT's accuracy has not been studied in more complex clinical scenarios. As heart disease and cardiac arrest remain leading causes of morbidity and mortality in the United States, finding technologies that help increase adherence to ACLS algorithms, which improves survival outcomes, is critical. This study aims to examine the accuracy of ChatGPT in following ACLS guidelines for bradycardia and cardiac arrest. We evaluated the accuracy of ChatGPT's responses to 2 simulations based on the 2020 American Heart Association ACLS guidelines with 3 primary outcomes of interest: the mean individual step accuracy, the accuracy score per simulation attempt, and the accuracy score for each algorithm. For each simulation step, ChatGPT was scored for correctness (1 point) or incorrectness (0 points). Each simulation was conducted 20 times. ChatGPT's median accuracy for each step was 85% (IQR 40%-100%) for cardiac arrest and 30% (IQR 13%-81%) for bradycardia. ChatGPT's median accuracy over 20 simulation attempts for cardiac arrest was 69% (IQR 67%-74%) and for bradycardia was 42% (IQR 33%-50%). We found that ChatGPT's outputs varied despite consistent input, the same actions were persistently missed, repetitive overemphasis hindered guidance, and erroneous medication information was presented. This study highlights the need for consistent and reliable guidance to prevent potential medical errors and optimize the application of ChatGPT to enhance its reliability and effectiveness in clinical practice.

收起

展开 

DOI：

10.2196/55037

被引量：

年份：

1970

全部来源

SCI-Hub (全网免费下载)

发表链接

ResearchGate (全网免费下载)

钛学术 (全网免费下载)

通过文献互助平台发起求助，成功后即可免费获取论文全文。

查看求助

求助方法1：

知识发现用户

每天可免费求助50篇

求助

求助方法1：

关注微信公众号

每天可免费求助2篇

求助方法2：

求助需要支付5个财富值

您现在财富值不足

您可以通过应助全文获取财富值

求助方法2：

完成求助需要支付5财富值

您目前有 1000 财富值

求助

我们已与文献出版商建立了直接购买合作。

你可以通过身份认证进行实名认证，认证成功后本次下载的费用将由您所在的图书馆支付

您可以直接购买此文献，1~5分钟即可下载全文，部分资源由于网络原因可能需要更长时间，请您耐心等待哦~

身份认证全文购买

相似文献(311)

参考文献(20)

引证文献(2)

ChatGPT's Performance in Cardiac Arrest and Bradycardia Simulations Using the American Heart Association's Advanced Cardiovascular Life Support Guidelines: Exploratory Study.

ChatGPT is the most advanced large language model to date, with prior iterations having passed medical licensing examinations, providing clinical decision support, and improved diagnostics. Although limited, past studies of ChatGPT's performance found that artificial intelligence could pass the American Heart Association's advanced cardiovascular life support (ACLS) examinations with modifications. ChatGPT's accuracy has not been studied in more complex clinical scenarios. As heart disease and cardiac arrest remain leading causes of morbidity and mortality in the United States, finding technologies that help increase adherence to ACLS algorithms, which improves survival outcomes, is critical. This study aims to examine the accuracy of ChatGPT in following ACLS guidelines for bradycardia and cardiac arrest. We evaluated the accuracy of ChatGPT's responses to 2 simulations based on the 2020 American Heart Association ACLS guidelines with 3 primary outcomes of interest: the mean individual step accuracy, the accuracy score per simulation attempt, and the accuracy score for each algorithm. For each simulation step, ChatGPT was scored for correctness (1 point) or incorrectness (0 points). Each simulation was conducted 20 times. ChatGPT's median accuracy for each step was 85% (IQR 40%-100%) for cardiac arrest and 30% (IQR 13%-81%) for bradycardia. ChatGPT's median accuracy over 20 simulation attempts for cardiac arrest was 69% (IQR 67%-74%) and for bradycardia was 42% (IQR 33%-50%). We found that ChatGPT's outputs varied despite consistent input, the same actions were persistently missed, repetitive overemphasis hindered guidance, and erroneous medication information was presented. This study highlights the need for consistent and reliable guidance to prevent potential medical errors and optimize the application of ChatGPT to enhance its reliability and effectiveness in clinical practice.

Pham C ，Govender R ，Tehami S ，Chavez S ，Adepoju OE ，Liaw W ... - 《JOURNAL OF MEDICAL INTERNET RESEARCH》

被引量: 2 发表:1970年
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.

Riedel M ，Kaefinger K ，Stuehrenberg A ，Ritter V ，Amann N ，Graf A ，Recker F ，Klein E ，Kiechle M ，Riedel F ，Meyer B ... - 《Frontiers in Medicine》

被引量: 5 发表:1970年
ChatGPT's adherence to otolaryngology clinical practice guidelines.

Large language models, including ChatGPT, has the potential to transform the way we approach medical knowledge, yet accuracy in clinical topics is critical. Here we assessed ChatGPT's performance in adhering to the American Academy of Otolaryngology-Head and Neck Surgery guidelines. We presented ChatGPT with 24 clinical otolaryngology questions based on the guidelines of the American Academy of Otolaryngology. This was done three times (N = 72) to test the model's consistency. Two otolaryngologists evaluated the responses for accuracy and relevance to the guidelines. Cohen's Kappa was used to measure evaluator agreement, and Cronbach's alpha assessed the consistency of ChatGPT's responses. The study revealed mixed results; 59.7% (43/72) of ChatGPT's responses were highly accurate, while only 2.8% (2/72) directly contradicted the guidelines. The model showed 100% accuracy in Head and Neck, but lower accuracy in Rhinology and Otology/Neurotology (66%), Laryngology (50%), and Pediatrics (8%). The model's responses were consistent in 17/24 (70.8%), with a Cronbach's alpha value of 0.87, indicating a reasonable consistency across tests. Using a guideline-based set of structured questions, ChatGPT demonstrates consistency but variable accuracy in otolaryngology. Its lower performance in some areas, especially Pediatrics, suggests that further rigorous evaluation is needed before considering real-world clinical use.

Tessler I ，Wolfovitz A ，Alon EE ，Gecel NA ，Livneh N ，Zimlichman E ，Klang E ... - 《-》

被引量: - 发表:1970年
Sailing the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations.

Alfertshofer M ，Hoch CC ，Funk PF ，Hollmann K ，Wollenberg B ，Knoedler S ，Knoedler L ... - 《-》

被引量: 18 发表:1970年
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.

Investigates the integration of an artificial intelligence tool, specifically ChatGPT, in nursing education, addressing its effectiveness in exam preparation and self-assessment. This study aims to evaluate the performance of ChatGPT, one of the most promising artificial intelligence-driven linguistic understanding tools in answering question banks for nursing licensing examination preparation. It further analyzes question characteristics that might impact the accuracy of ChatGPT-generated answers and examines its reliability through human expert reviews. Cross-sectional survey comparing ChatGPT-generated answers and their explanations. 400 questions from Taiwan's 2022 Nursing Licensing Exam. The study analyzed 400 questions from five distinct subjects of Taiwan's 2022 Nursing Licensing Exam using the ChatGPT model which provided answers and in-depth explanations for each question. The impact of various question characteristics, such as type and cognitive level, on the accuracy of the ChatGPT-generated responses was assessed using logistic regression analysis. Additionally, human experts evaluated the explanations for each question, comparing them with the ChatGPT-generated answers to determine consistency. ChatGPT exhibited overall accuracy at 80.75 % for Taiwan's National Nursing Exam, which passes the exam. The accuracy of ChatGPT-generated answers diverged significantly across test subjects, demonstrating a hierarchy ranging from General Medicine at 88.75 %, Medical-Surgical Nursing at 80.0 %, Psychology and Community Nursing at 70.0 %, Obstetrics and Gynecology Nursing at 67.5 %, down to Basic Nursing at 63.0 %. ChatGPT had a higher probability of eliciting incorrect responses for questions with certain characteristics, notably those with clinical vignettes [odds ratio 2.19, 95 % confidence interval 1.24-3.87, P = 0.007] and complex multiple-choice questions [odds ratio 2.37, 95 % confidence interval 1.00-5.60, P = 0.049]. Furthermore, 14.25 % of ChatGPT-generated answers were inconsistent with their explanations, leading to a reduction in the overall accuracy to 74 %. This study reveals the ChatGPT's capabilities and limitations in nursing exam preparation, underscoring its potential as an auxiliary educational tool. It highlights the model's varied performance across different question types and notable inconsistencies between its answers and explanations. The study contributes significantly to the understanding of artificial intelligence in learning environments, guiding the future development of more effective and reliable artificial intelligence-based educational technologies. New study reveals ChatGPT's potential and challenges in nursing education: Achieves 80.75 % accuracy in exam prep but faces hurdles with complex questions and logical consistency. #AIinNursing #AIinEducation #NursingExams #ChatGPT.

Su MC ，Lin LE ，Lin LH ，Chen YC ... - 《-》

被引量: 2 发表:1970年

加载更多

来源期刊

JOURNAL OF MEDICAL INTERNET RESEARCH

影响因子：7.069

JCR分区：暂无

中科院分区：暂无