Chatbots talk Strabismus: Can AI become the new patient Educator?
Strabismus is a common eye condition affecting both children and adults. Effective patient education is crucial for informed decision-making, but traditional methods often lack accessibility and engagement. Chatbots powered by AI have emerged as a promising solution.
This study aims to evaluate and compare the performance of three chatbots (ChatGPT, Bard, and Copilot) and a reliable website (AAPOS) in answering real patient questions about strabismus.
Three chatbots (ChatGPT, Bard, and Copilot) were compared to a reliable website (AAPOS) using real patient questions. Metrics included accuracy (SOLO taxonomy), understandability/actionability (PEMAT), and readability (Flesch-Kincaid). We also performed a sentiment analysis to capture the emotional tone and impact of the responses.
The AAPOS achieved the highest mean SOLO score (4.14 ± 0.47), followed by Bard, Copilot, and ChatGPT. Bard scored highest on both PEMAT-U (74.8 ± 13.3) and PEMAT-A (66.2 ± 13.6) measures. Flesch-Kincaid Ease Scores revealed the AAPOS as the easiest to read (mean score: 55.8 ± 14.11), closely followed by Copilot. ChatGPT, and Bard had lower scores on readability. The sentiment analysis revealed exciting differences.
Chatbots, particularly Bard and Copilot, show promise in patient education for strabismus with strengths in understandability and actionability. However, the AAPOS website outperformed in accuracy and readability.
Edhem Yılmaz İ
,Berhuni M
,Özer Özcan Z
,Doğan L
... -
《-》
ChatGPT and Google Assistant as a Source of Patient Education for Patients With Amblyopia: Content Analysis.
We queried ChatGPT (OpenAI) and Google Assistant about amblyopia and compared their answers with the keywords found on the American Association for Pediatric Ophthalmology and Strabismus (AAPOS) website, specifically the section on amblyopia. Out of the 26 keywords chosen from the website, ChatGPT included 11 (42%) in its responses, while Google included 8 (31%).
Our study investigated the adherence of ChatGPT-3.5 and Google Assistant to the guidelines of the AAPOS for patient education on amblyopia.
ChatGPT-3.5 was used. The four questions taken from the AAPOS website, specifically its glossary section for amblyopia, are as follows: (1) What is amblyopia? (2) What causes amblyopia? (3) How is amblyopia treated? (4) What happens if amblyopia is untreated? Approved and selected by ophthalmologists (GW and DL), the keywords from AAPOS were words or phrases that deemed significant for the education of patients with amblyopia. The "Flesch-Kincaid Grade Level" formula, approved by the US Department of Education, was used to evaluate the reading comprehension level for the responses from ChatGPT, Google Assistant, and AAPOS.
In their responses, ChatGPT did not mention the term "ophthalmologist," whereas Google Assistant and AAPOS both mentioned the term once and twice, respectively. ChatGPT did, however, use the term "eye doctors" once. According to the Flesch-Kincaid test, the average reading level of AAPOS was 11.4 (SD 2.1; the lowest level) while that of Google was 13.1 (SD 4.8; the highest required reading level), also showing the greatest variation in grade level in its responses. ChatGPT's answers, on average, scored 12.4 (SD 1.1) grade level. They were all similar in terms of difficulty level in reading. For the keywords, out of the 4 responses, ChatGPT used 42% (11/26) of the keywords, whereas Google Assistant used 31% (8/26).
ChatGPT trains on texts and phrases and generates new sentences, while Google Assistant automatically copies website links. As ophthalmologists, we should consider including "see an ophthalmologist" on our websites and journals. While ChatGPT is here to stay, we, as physicians, need to monitor its answers.
Wu G
,Lee DA
,Zhao W
,Wong A
,Jhangiani R
,Kurniawan S
... -
《JOURNAL OF MEDICAL INTERNET RESEARCH》
Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.
Introduction With the potential for artificial intelligence (AI) chatbots to serve as the primary source of glaucoma information to patients, it is essential to characterize the information that chatbots provide such that providers can tailor discussions, anticipate patient concerns, and identify misleading information. Therefore, the purpose of this study was to evaluate glaucoma information from AI chatbots, including ChatGPT-4, Bard, and Bing, by analyzing response accuracy, comprehensiveness, readability, word count, and character count in comparison to each other and glaucoma-related American Academy of Ophthalmology (AAO) patient materials. Methods Section headers from AAO glaucoma-related patient education brochures were adapted into question form and asked five times to each AI chatbot (ChatGPT-4, Bard, and Bing). Two sets of responses from each chatbot were used to evaluate the accuracy of AI chatbot responses and AAO brochure information, and the comprehensiveness of AI chatbot responses compared to the AAO brochure information, scored 1-5 by three independent glaucoma-trained ophthalmologists. Readability (assessed with Flesch-Kincaid Grade Level (FKGL), corresponding to the United States school grade levels), word count, and character count were determined for all chatbot responses and AAO brochure sections. Results Accuracy scores for AAO, ChatGPT, Bing, and Bard were 4.84, 4.26, 4.53, and 3.53, respectively. On direct comparison, AAO was more accurate than ChatGPT (p=0.002), and Bard was the least accurate (Bard versus AAO, p<0.001; Bard versus ChatGPT, p<0.002; Bard versus Bing, p=0.001). ChatGPT had the most comprehensive responses (ChatGPT versus Bing, p<0.001; ChatGPT versus Bard p=0.008), with comprehensiveness scores for ChatGPT, Bing, and Bard at 3.32, 2.16, and 2.79, respectively. AAO information and Bard responses were at the most accessible readability levels (AAO versus ChatGPT, AAO versus Bing, Bard versus ChatGPT, Bard versus Bing, all p<0.0001), with readability levels for AAO, ChatGPT, Bing, and Bard at 8.11, 13.01, 11.73, and 7.90, respectively. Bing responses had the lowest word and character count. Conclusion AI chatbot responses varied in accuracy, comprehensiveness, and readability. With accuracy scores and comprehensiveness below that of AAO brochures and elevated readability levels, AI chatbots require improvements to be a more useful supplementary source of glaucoma information for patients. Physicians must be aware of these limitations such that patients are asked about existing knowledge and questions and are then provided with clarifying and comprehensive information.
Yalla GR
,Hyman N
,Hock LE
,Zhang Q
,Shukla AG
,Kolomeyer NN
... -
《Cureus》