Comparison of Answers between ChatGPT and Human Dieticians to Common Nutrition Questions.
More people than ever seek nutrition information from online sources. The chatbot ChatGPT has seen staggering popularity since its inception and may become a resource for information in nutrition. However, the adequacy of ChatGPT to answer questions in the field of nutrition has not been investigated. Thus, the aim of this research was to investigate the competency of ChatGPT in answering common nutrition questions.
Dieticians were asked to provide their most commonly asked nutrition questions and their own answers to them. We then asked the same questions to ChatGPT and sent both sets of answers to other dieticians (N = 18) or nutritionists and experts in the domain of each question (N = 9) to be graded based on scientific correctness, actionability, and comprehensibility. The grades were also averaged to give an overall score, and group means of the answers to each question were compared using permutation tests.
The overall grades for ChatGPT were higher than those from the dieticians for the overall scores in five of the eight questions we received. ChatGPT also had higher grades on five occasions for scientific correctness, four for actionability, and five for comprehensibility. In contrast, none of the answers from the dieticians had a higher average score than ChatGPT for any of the questions, both overall and for each of the grading components.
Our results suggest that ChatGPT can be used to answer nutrition questions that are frequently asked to dieticians and provide encouraging support for the role of chatbots in offering nutrition support.
Kirk D
,van Eijnatten E
,Camps G
《-》
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.
Purpose Artificial intelligence (AI) has rapidly gained popularity with the growth of ChatGPT (OpenAI, San Francisco, USA) and other large-language model chatbots, and these programs have tremendous potential to impact medicine. One important area of consequence in medicine and public health is that patients may use these programs in search of answers to medical questions. Despite the increased utilization of AI chatbots by the public, there is little research to assess the reliability of ChatGPT and alternative programs when queried for medical information. This study seeks to elucidate the accuracy and readability of AI chatbots in answering patient questions regarding urology. As vasectomy is one of the most common urologic procedures, this study investigates AI-generated responses to frequently asked vasectomy-related questions. For this study, five popular and free-to-access AI platforms were utilized to undertake this investigation. Methods Fifteen vasectomy-related questions were individually queried to five AI chatbots from November-December 2023: ChatGPT (OpenAI, San Francisco, USA), Bard (Google Inc., Mountainview, USA) Bing (Microsoft, Redmond, USA) Perplexity (Perplexity AI Inc., San Francisco, USA), and Claude (Anthropic, San Francisco, USA). Responses from each platform were graded by two attending urologists, two urology research faculty, and one urological resident physician using a Likert (1-6) scale: (1-completely inaccurate, 6-completely accurate) based on comparison to existing American Urological Association guidelines. Flesch-Kincaid Grade levels (FKGL) and Flesch Reading Ease scores (FRES) (1-100) were calculated for each response. To assess differences in Likert, FRES, and FKGL, Kruskal-Wallis tests were performed using GraphPad Prism V10.1.0 (GraphPad, San Diego, USA) with Alpha set at 0.05. Results Analysis shows that ChatGPT provided the most accurate responses across the five AI chatbots with an average score of 5.04 on the Likert scale. Subsequently, Microsoft Bing (4.91), Anthropic Claude (4.65), Google Bard (4.43), and Perplexity (4.41) followed. All five chatbots were found to score, on average, higher than 4.41 corresponding to a score of at least "somewhat accurate." Google Bard received the highest Flesch Reading Ease score (49.67) and lowest Grade level (10.1) when compared to the other chatbots. Anthropic Claude scored 46.7 on the FRES and 10.55 on the FKGL. Microsoft Bing scored 45.57 on the FRES and 11.56 on the FKGL. Perplexity scored 36.4 on the FRES and 13.29 on the FKGL. ChatGPT had the lowest FRES of 30.4 and highest FKGL of 14.2. Conclusion This study investigates the use of AI in medicine, specifically urology, and it helps to determine whether large-language model chatbots can be reliable sources of freely available medical information. All five AI chatbots on average were able to achieve at least "somewhat accurate" on a 6-point Likert scale. In terms of readability, all five AI chatbots on average had Flesch Reading Ease scores of less than 50 and were higher than a 10th-grade level. In this small-scale study, there were several significant differences identified between the readability scores of each AI chatbot. However, there were no significant differences found among their accuracies. Thus, our study suggests that major AI chatbots may perform similarly in their ability to be correct but differ in their ease of being comprehended by the general public.
Carlson JA
,Cheng RZ
,Lange A
,Nagalakshmi N
,Rabets J
,Shah T
,Sindhwani P
... -
《Cureus》
Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer (177)Lu-PSMA-617 therapy.
Many patients use artificial intelligence (AI) chatbots as a rapid source of health information. This raises important questions about the reliability and effectiveness of AI chatbots in delivering accurate and understandable information.
To evaluate and compare the accuracy, conciseness, and readability of responses from OpenAI ChatGPT-4 and Google Bard to patient inquiries concerning the novel 177Lu-PSMA-617 therapy for prostate cancer.
Two experts listed the 12 most commonly asked questions by patients on 177Lu-PSMA-617 therapy. These twelve questions were prompted to OpenAI ChatGPT-4 and Google Bard. AI-generated responses were distributed using an online survey platform (Qualtrics) and blindly rated by eight experts. The performances of the AI chatbots were evaluated and compared across three domains: accuracy, conciseness, and readability. Additionally, potential safety concerns associated with AI-generated answers were also examined. The Mann-Whitney U and chi-square tests were utilized to compare the performances of AI chatbots.
Eight experts participated in the survey, evaluating 12 AI-generated responses across the three domains of accuracy, conciseness, and readability, resulting in 96 assessments (12 responses x 8 experts) for each domain per chatbot. ChatGPT-4 provided more accurate answers than Bard (2.95 ± 0.671 vs 2.73 ± 0.732, p=0.027). Bard's responses had better readability than ChatGPT-4 (2.79 ± 0.408 vs 2.94 ± 0.243, p=0.003). Both ChatGPT-4 and Bard achieved comparable conciseness scores (3.14 ± 0.659 vs 3.11 ± 0.679, p=0.798). Experts categorized the AI-generated responses as incorrect or partially correct at a rate of 16.6% for ChatGPT-4 and 29.1% for Bard. Bard's answers contained significantly more misleading information than those of ChatGPT-4 (p = 0.039).
AI chatbots have gained significant attention, and their performance is continuously improving. Nonetheless, these technologies still need further improvements to be considered reliable and credible sources for patients seeking medical information on 177Lu-PSMA-617 therapy.
Belge Bilgin G
,Bilgin C
,Childs DS
,Orme JJ
,Burkett BJ
,Packard AT
,Johnson DR
,Thorpe MP
,Riaz IB
,Halfdanarson TR
,Johnson GB
,Sartor O
,Kendi AT
... -
《Frontiers in Oncology》
Evaluating ChatGPT to test its robustness as an interactive information database of radiation oncology and to assess its responses to common queries from radiotherapy patients: A single institution investigation.
Commercial vendors have created artificial intelligence (AI) tools for use in all aspects of life and medicine, including radiation oncology. AI innovations will likely disrupt workflows in the field of radiation oncology. However, limited data exist on using AI-based chatbots about the quality of radiation oncology information. This study aims to assess the accuracy of ChatGPT, an AI-based chatbot, in answering patients' questions during their first visit to the radiation oncology outpatient department and test knowledge of ChatGPT in radiation oncology.
Expert opinion was formulated using a set of ten standard questions of patients encountered in outpatient department practice. A blinded expert opinion was taken for the ten questions on common queries of patients in outpatient department visits, and the same questions were evaluated on ChatGPT version 3.5 (ChatGPT 3.5). The answers by expert and ChatGPT were independently evaluated for accuracy by three scientific reviewers. Additionally, a comparison was made for the extent of similarity of answers between ChatGPT and experts by a response scoring for each answer. Word count and Flesch-Kincaid readability score and grade were done for the responses obtained from expert and ChatGPT. A comparison of the answers of ChatGPT and expert was done with a Likert scale. As a second component of the study, we tested the technical knowledge of ChatGPT. Ten multiple choice questions were framed with increasing order of difficulty - basic, intermediate and advanced, and the responses were evaluated on ChatGPT. Statistical testing was done using SPSS version 27.
After expert review, the accuracy of expert opinion was 100%, and ChatGPT's was 80% (8/10) for regular questions encountered in outpatient department visits. A noticeable difference was observed in word count and readability of answers from expert opinion or ChatGPT. Of the ten multiple-choice questions for assessment of radiation oncology database, ChatGPT had an accuracy rate of 90% (9 out of 10). One answer to a basic-level question was incorrect, whereas all answers to intermediate and difficult-level questions were correct.
ChatGPT provides reasonably accurate information about routine questions encountered in the first outpatient department visit of the patient and also demonstrated a sound knowledge of the subject. The result of our study can inform the future development of educational tools in radiation oncology and may have implications in other medical fields. This is the first study that provides essential insight into the potentially positive capabilities of two components of ChatGPT: firstly, ChatGPT's response to common queries of patients at OPD visits, and secondly, the assessment of the radiation oncology knowledge base of ChatGPT.
Pandey VK
,Munshi A
,Mohanti BK
,Bansal K
,Rastogi K
... -
《-》