Diagnostic performances of Claude 3 Opus and Claude 3.5 Sonnet from patient history and key images in Radiology's "Diagnosis Please" cases.
摘要:
The diagnostic performance of large language artificial intelligence (AI) models when utilizing radiological images has yet to be investigated. We employed Claude 3 Opus (released on March 4, 2024) and Claude 3.5 Sonnet (released on June 21, 2024) to investigate their diagnostic performances in response to the Radiology's Diagnosis Please quiz questions. In this study, the AI models were tasked with listing the primary diagnosis and two differential diagnoses for 322 quiz questions from Radiology's "Diagnosis Please" cases, which included cases 1 to 322, published from 1998 to 2023. The analyses were performed under the following conditions: (1) Condition 1: submitter-provided clinical history (text) alone. (2) Condition 2: submitter-provided clinical history and imaging findings (text). (3) Condition 3: clinical history (text) and key images (PNG file). We applied McNemar's test to evaluate differences in the correct response rates for the overall accuracy under Conditions 1, 2, and 3 for each model and between the models. The correct diagnosis rates were 58/322 (18.0%) and 69/322 (21.4%), 201/322 (62.4%) and 209/322 (64.9%), and 80/322 (24.8%) and 97/322 (30.1%) for Conditions 1, 2, and 3 for Claude 3 Opus and Claude 3.5 Sonnet, respectively. The models provided the correct answer as a differential diagnosis in up to 26/322 (8.1%) for Opus and 23/322 (7.1%) for Sonnet. Statistically significant differences were observed in the correct response rates among all combinations of Conditions 1, 2, and 3 for each model (p < 0.01). Claude 3.5 Sonnet outperformed in all conditions, but a statistically significant difference was observed only in the comparison for Condition 3 (30.1% vs. 24.8%, p = 0.028). Two AI models demonstrated a significantly improved diagnostic performance when inputting both key images and clinical history. The models' ability to identify important differential diagnoses under these conditions was also confirmed.
收起
展开
DOI:
10.1007/s11604-024-01634-z
被引量:
年份:
1970


通过 文献互助 平台发起求助,成功后即可免费获取论文全文。
求助方法1:
知识发现用户
每天可免费求助50篇
求助方法1:
关注微信公众号
每天可免费求助2篇
求助方法2:
完成求助需要支付5财富值
您目前有 1000 财富值
相似文献(100)
参考文献(3)
引证文献(8)
来源期刊
影响因子:暂无数据
JCR分区: 暂无
中科院分区:暂无