Evaluating the Efficacy of Large Language Models in Generating Medical Documentation: A Comparative Study of ChatGPT-4, ChatGPT-4o, and Claude.
摘要:
Large language models (LLMs) have demonstrated transformative potential in health care. They can enhance clinical and academic medicine by facilitating accurate diagnoses, interpreting laboratory results, and automating documentation processes. This study evaluates the efficacy of LLMs in generating surgical operation reports and discharge summaries, focusing on accuracy, efficiency, and quality. This study assessed the effectiveness of three leading LLMs-ChatGPT-4.0, ChatGPT-4o, and Claude-using six prompts and analyzing their responses for readability and output quality, validated by plastic surgeons. Readability was measured with the Flesch-Kincaid, Flesch reading ease scores, and Coleman-Liau Index, while reliability was evaluated using the DISCERN score. A paired two-tailed t-test (p<0.05) compared the statistical significance of these metrics and the time taken to generate operation reports and discharge summaries against the authors' results. Table 3 shows statistically significant differences in readability between ChatGPT-4o and Claude across all metrics, while ChatGPT-4 and Claude differ significantly in the Flesch reading ease and Coleman-Liau indices. Table 6 reveals extremely low p-values across BL, IS, and MM for all models, with Claude consistently outperforming both ChatGPT-4 and ChatGPT-4o. Additionally, Claude generated documents the fastest, completing tasks in approximately 10 to 14 s. These results suggest that Claude not only excels in readability but also demonstrates superior reliability and speed, making it an efficient choice for practical applications. The study highlights the importance of selecting appropriate LLMs for clinical use. Integrating these LLMs can streamline healthcare documentation, improve efficiency, and enhance patient outcomes through clearer communication and more accurate medical reports. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
收起
展开
DOI:
10.1007/s00266-025-04842-8
被引量:
年份:
1970


通过 文献互助 平台发起求助,成功后即可免费获取论文全文。
求助方法1:
知识发现用户
每天可免费求助50篇
求助方法1:
关注微信公众号
每天可免费求助2篇
求助方法2:
完成求助需要支付5财富值
您目前有 1000 财富值
相似文献(100)
参考文献(28)
引证文献(0)
来源期刊
影响因子:暂无数据
JCR分区: 暂无
中科院分区:暂无