The promising role of chatbots in keratorefractive surgery patient education.
To evaluate the appropriateness, understandability, actionability, and readability of responses provided by ChatGPT-3.5, Bard, and Bing Chat to frequently asked questions about keratorefractive surgery (KRS).
Thirty-eight frequently asked questions about KRS were directed three times to a fresh ChatGPT-3.5, Bard, and Bing Chat interfaces. Two experienced refractive surgeons categorized the chatbots' responses according to their appropriateness and the accuracy of the responses was assessed using the Structure of the Observed Learning Outcome (SOLO) taxonomy. Flesch Reading Ease (FRE) and Coleman-Liau Index (CLI) were used to evaluate the readability of the responses of chatbots. Furthermore, the understandability scores of responses were evaluated using the Patient Education Materials Assessment Tool (PEMAT).
The appropriateness of the ChatGPT-3.5, Bard, and Bing Chat responses was 86.8% (33/38), 84.2% (32/38), and 81.5% (31/38), respectively (P>0.05). According to the SOLO test, ChatGPT-3.5 (3.91±0.44) achieved the highest mean accuracy and followed by Bard (3.64±0.61) and Bing Chat (3.19±0.55). For understandability (mean PEMAT-U score the ChatGPT-3.5: 68.5%, Bard: 78.6%, and Bing Chat: 67.1%, P<0.05), and actionability (mean PEMAT-A score the ChatGPT-3.5: 62.6%, Bard: 72.4%, and Bing Chat: 60.9%, P<0.05) the Bard scored better than the other chatbots. Two readability analyses showed that Bing had the highest readability, followed by the ChatGPT-3.5 and Bard, however, the understandability and readability scores were more challenging than the recommended level.
Artificial intelligence supported chatbots have the potential to provide detailed and appropriate responses at acceptable levels in KRS. Chatbots, while promising for patient education in KRS, require further progress, especially in readability and understandability aspects.
Doğan L
,Özer Özcan Z
,Edhem Yılmaz I
《-》
Artificial Doctors: Performance of Chatbots as a Tool for Patient Education on Keratoconus.
We aimed to compare the answers given by ChatGPT, Bard, and Copilot and that obtained from the American Academy of Ophthalmology (AAO) website to patient-written questions related to keratoconus in terms of accuracy, understandability, actionability, and readability to find out whether chatbots can be used in patient education.
Twenty patient-written questions obtained from the AAO website related to keratoconus were asked to ChatGPT, Bard, and Copilot. Two ophthalmologists independently assessed the answers obtained from chatbots and the AAO website in terms of accuracy, understandability, and actionability according to the Structure of Observed Learning Outcome taxonomy, Patient Education Materials Assessment Tool-Understandability, and Patient Education Materials Assessment Tool-Actionability tests, respectively. The answers were also compared for readability according to the Flesch Reading Ease scores obtained through the website.
Bard had significantly higher scores compared with ChatGPT-3.5, Copilot, and AAO website according to Structure of Observed Learning Outcome taxonomy and Patient Education Materials Assessment Tool-Understandability ( P <0.001 for each), whereas there was no significant difference between the other groups. Bard and ChatGPT achieved significantly higher scores than the AAO website according to the Patient Education Materials Assessment Tool-Actionability scale ( P =0.001). The AAO website achieved significantly higher scores than the Bard on the Flesch Reading Ease scale, whereas there was no significant difference between the other groups ( P =0.017).
Chatbots are promising to provide accurate, understandable, and actionable answers. Chatbots can be a valuable aid in the education of patients with keratoconus under clinician supervision. In this way, unnecessary hospital visits can be prevented, and the burden on the health care system can be alleviated, while patient awareness can be raised.
Özer Özcan Z
,Doğan L
,Yilmaz IE
《-》
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.
Artificial intelligence (AI) and the introduction of Large Language Model (LLM) chatbots have become a common source of patient inquiry in healthcare. The quality and readability of AI-generated patient education materials (PEM) is the subject of many studies across multiple medical topics. Most demonstrate poor readability and acceptable quality. However, an area yet to be investigated is chemotherapy-induced cardiotoxicity. This study seeks to assess the quality and readability of chatbot created PEM relative to chemotherapy-induced cardiotoxicity. We conducted an observational cross-sectional study in August 2024 by asking 10 questions to 4 chatbots: ChatGPT, Microsoft Copilot (Copilot), Google Gemini (Gemini), and Meta AI (Meta). The generated material was assessed for readability using 7 tools: Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), Simple Measure of Gobbledygook (SMOG) Index, Automated Readability Index (ARI), and FORCAST Grade Level. Quality was assessed using modified versions of 2 validated tools: the Patient Education Materials Assessment Tool (PEMAT), which outputs a 0% to 100% score, and DISCERN, a 1 (unsatisfactory) to 5 (highly satisfactory) scoring system. Descriptive statistics were used to evaluate performance and compare chatbots amongst each other. Mean reading grade level (RGL) across all chatbots was 13.7. Calculated RGLs for ChatGPT, Copilot, Gemini and Meta were 14.2, 14.0, 12.5, 14.2, respectively. Mean DISCERN scores across the chatbots was 4.2. DISCERN scores for ChatGPT, Copilot, Gemini, and Meta were 4.2, 4.3, 4.2, and 3.9, respectively. Median PEMAT scores for understandability and actionability were 91.7% and 75%, respectively. Understandability and actionability scores for ChatGPT, Copilot, Gemini, and Meta were 100% and 75%, 91.7% and 75%, 90.9% and 75%, and 91.7% and 50%, respectively. AI chatbots produce high quality PEM with poor readability. We do not discourage using chatbots to create PEM but recommend cautioning patients about their readability concerns. AI chatbots are not an alternative to a healthcare provider. Furthermore, there is no consensus on which chatbots create the highest quality PEM. Future studies are needed to assess the effectiveness of AI chatbots in providing PEM to patients and how the capabilities of AI chatbots are changing over time.
Stephenson-Moe CA
,Behers BJ
,Gibons RM
,Behers BM
,Jesus Herrera L
,Anneaud D
,Rosario MA
,Wojtas CN
,Bhambrah S
,Hamad KM
... -
《-》