Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.

来自 PUBMED

作者:

Gill GSTsai JMoxam JSanghvi HAGupta S

展开

摘要:

Background With advancements in natural language processing, tools such as Chat Generative Pre-Trained Transformers (ChatGPT) version 4.0 and Google Bard's Gemini Advanced are being increasingly evaluated for their potential in various medical applications. The objective of this study was to systematically assess the performance of these language learning models (LLMs) on both image and non-image-based questions within the specialized field of Ophthalmology. We used a review question bank for the Ophthalmic Knowledge Assessment Program (OKAP) used by ophthalmology residents nationally to prepare for the Ophthalmology Board Exam to assess the accuracy and performance of ChatGPT and Gemini Advanced. Methodology A total of 260 randomly generated multiple-choice questions from the OphthoQuestions question bank were run through ChatGPT and Gemini Advanced. A simulated 260-question OKAP examination was created at random from the bank. Question-specific data were analyzed, including overall percent correct, subspecialty accuracy, whether the question was "high yield," difficulty (1-4), and question type (e.g., image, text). To compare the performance of ChatGPT-4 and Gemini on the difficulty of questions, we utilized the standard deviation of user answer choices to determine question difficulty. In this study, a statistical analysis of Google Sheets was conducted using two-tailed t-tests with unequal variance to compare the performance of ChatGPT-4.0 and Google's Gemini Advanced across various question types, subspecialties, and difficulty levels. Results In total, 259 of the 260 questions were included in the study as one question used a video that any form of ChatGPT could not interpret as of May 1, 2024. For text-only questions, ChatGPT-4.0.0 correctly answered 57.14% (148/259, p < 0.018), and Gemini Advanced correctly answered 46.72% (121/259, p < 0.018). Both versions answered most questions without a prompt and would have received a below-average score on the OKAP. Moreover, there were 27 questions utilizing a secondary prompt in ChatGPT-4.0 compared to 67 questions in Gemini Advanced. ChatGPT-4.0 performed 68.99% on easier questions (<2 on a scale from 1-4) and 44.96% on harder questions (>2 on a scale from 1-4). On the other hand, Gemini Advanced performed 49.61% on easier questions (<2 on a scale from 1-4) and 44.19% on harder questions (>2 on a scale from 1-4). There was a statistically significant difference in accuracy between ChatGPT-4.0 and Gemini Advanced for easy (p < 0.0015) but not for hard (p < 0.55) questions. For image-only questions, ChatGPT-4.0 correctly answered 39.58% (19/48, p < 0.013), and Gemini Advanced correctly answered 33.33% (16/48, p < 0.022), resulting in a statistically insignificant difference in accuracy between ChatGPT-4.0 and Gemini Advanced (p < 0.530). A comparison against text-only and image-based questions demonstrated a statistically significant difference in accuracy for both ChatGPT-4.0 (p < 0.013) and Gemini Advanced (p < 0.022). Conclusions This study provides evidence that ChatGPT-4.0 performs better on the OKAP-style exams and is improved compared to Gemini Advanced within the context of ophthalmic multiple-choice questions. This may show an opportunity for increased worth for ChatGPT in ophthalmic medical education. While showing promise within medical education, caution should be used as a more detailed evaluation of reliability is needed.

收起

展开

DOI:

10.7759/cureus.69612

被引量:

0

年份:

1970

SCI-Hub (全网免费下载) 发表链接

通过 文献互助 平台发起求助,成功后即可免费获取论文全文。

查看求助

求助方法1:

知识发现用户

每天可免费求助50篇

求助

求助方法1:

关注微信公众号

每天可免费求助2篇

求助方法2:

求助需要支付5个财富值

您现在财富值不足

您可以通过 应助全文 获取财富值

求助方法2:

完成求助需要支付5财富值

您目前有 1000 财富值

求助

我们已与文献出版商建立了直接购买合作。

你可以通过身份认证进行实名认证,认证成功后本次下载的费用将由您所在的图书馆支付

您可以直接购买此文献,1~5分钟即可下载全文,部分资源由于网络原因可能需要更长时间,请您耐心等待哦~

身份认证 全文购买

相似文献(100)

参考文献(21)

引证文献(0)

来源期刊

Cureus

影响因子:0

JCR分区: 暂无

中科院分区:暂无

研究点推荐

关于我们

zlive学术集成海量学术资源,融合人工智能、深度学习、大数据分析等技术,为科研工作者提供全面快捷的学术服务。在这里我们不忘初心,砥砺前行。

友情链接

联系我们

合作与服务

©2024 zlive学术声明使用前必读