Assessing the efficacy of artificial intelligence to provide peri-operative information for patients with a stoma.
Stomas present significant lifestyle and psychological challenges for patients, requiring comprehensive education and support. Current educational methods have limitations in offering relevant information to the patient, highlighting a potential role for artificial intelligence (AI). This study examined the utility of AI in enhancing stoma therapy management following colorectal surgery.
We compared the efficacy of four prominent large language models (LLM)-OpenAI's ChatGPT-3.5 and ChatGPT-4.0, Google's Gemini, and Bing's CoPilot-against a series of metrics to evaluate their suitability as supplementary clinical tools. Through qualitative and quantitative analyses, including readability scores (Flesch-Kincaid, Flesch-Reading Ease, and Coleman-Liau index) and reliability assessments (Likert scale, DISCERN score and QAMAI tool), the study aimed to assess the appropriateness of LLM-generated advice for patients managing stomas.
There are varying degrees of readability and reliability across the evaluated models, with CoPilot and ChatGPT-4 demonstrating superior performance in several key metrics such as readability and comprehensiveness. However, the study underscores the infant stage of LLM technology in clinical applications. All responses required high school to college level education to comprehend comfortably. While the LLMs addressed users' questions directly, the absence of incorporating patient-specific factors such as past medical history generated broad and generic responses rather than offering tailored advice.
The complexity of individual patient conditions can challenge AI systems. The use of LLMs in clinical settings holds promise for improving patient education and stoma management support, but requires careful consideration of the models' capabilities and the context of their use.
Lim B
,Lirios G
,Sakalkale A
,Satheakeerthy S
,Hayes D
,Yeung JMC
... -
《-》
Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients.
Abdominoplasty is a common operation, used for a range of cosmetic and functional issues, often in the context of divarication of recti, significant weight loss, and after pregnancy. Despite this, patient-surgeon communication gaps can hinder informed decision-making. The integration of large language models (LLMs) in healthcare offers potential for enhancing patient information. This study evaluated the feasibility of using LLMs for answering perioperative queries.
This study assessed the efficacy of four leading LLMs-OpenAI's ChatGPT-3.5, Anthropic's Claude, Google's Gemini, and Bing's CoPilot-using fifteen unique prompts. All outputs were evaluated using the Flesch-Kincaid, Flesch Reading Ease score, and Coleman-Liau index for readability assessment. The DISCERN score and a Likert scale were utilized to evaluate quality. Scores were assigned by two plastic surgical residents and then reviewed and discussed until a consensus was reached by five plastic surgeon specialists.
ChatGPT-3.5 required the highest level for comprehension, followed by Gemini, Claude, then CoPilot. Claude provided the most appropriate and actionable advice. In terms of patient-friendliness, CoPilot outperformed the rest, enhancing engagement and information comprehensiveness. ChatGPT-3.5 and Gemini offered adequate, though unremarkable, advice, employing more professional language. CoPilot uniquely included visual aids and was the only model to use hyperlinks, although they were not very helpful and acceptable, and it faced limitations in responding to certain queries.
ChatGPT-3.5, Gemini, Claude, and Bing's CoPilot showcased differences in readability and reliability. LLMs offer unique advantages for patient care but require careful selection. Future research should integrate LLM strengths and address weaknesses for optimal patient education.
This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Lim B
,Seth I
,Cuomo R
,Kenney PS
,Ross RJ
,Sofiadellis F
,Pentangelo P
,Ceccaroni A
,Alfano C
,Rozen WM
... -
《-》
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.
Artificial intelligence (AI) and the introduction of Large Language Model (LLM) chatbots have become a common source of patient inquiry in healthcare. The quality and readability of AI-generated patient education materials (PEM) is the subject of many studies across multiple medical topics. Most demonstrate poor readability and acceptable quality. However, an area yet to be investigated is chemotherapy-induced cardiotoxicity. This study seeks to assess the quality and readability of chatbot created PEM relative to chemotherapy-induced cardiotoxicity. We conducted an observational cross-sectional study in August 2024 by asking 10 questions to 4 chatbots: ChatGPT, Microsoft Copilot (Copilot), Google Gemini (Gemini), and Meta AI (Meta). The generated material was assessed for readability using 7 tools: Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), Simple Measure of Gobbledygook (SMOG) Index, Automated Readability Index (ARI), and FORCAST Grade Level. Quality was assessed using modified versions of 2 validated tools: the Patient Education Materials Assessment Tool (PEMAT), which outputs a 0% to 100% score, and DISCERN, a 1 (unsatisfactory) to 5 (highly satisfactory) scoring system. Descriptive statistics were used to evaluate performance and compare chatbots amongst each other. Mean reading grade level (RGL) across all chatbots was 13.7. Calculated RGLs for ChatGPT, Copilot, Gemini and Meta were 14.2, 14.0, 12.5, 14.2, respectively. Mean DISCERN scores across the chatbots was 4.2. DISCERN scores for ChatGPT, Copilot, Gemini, and Meta were 4.2, 4.3, 4.2, and 3.9, respectively. Median PEMAT scores for understandability and actionability were 91.7% and 75%, respectively. Understandability and actionability scores for ChatGPT, Copilot, Gemini, and Meta were 100% and 75%, 91.7% and 75%, 90.9% and 75%, and 91.7% and 50%, respectively. AI chatbots produce high quality PEM with poor readability. We do not discourage using chatbots to create PEM but recommend cautioning patients about their readability concerns. AI chatbots are not an alternative to a healthcare provider. Furthermore, there is no consensus on which chatbots create the highest quality PEM. Future studies are needed to assess the effectiveness of AI chatbots in providing PEM to patients and how the capabilities of AI chatbots are changing over time.
Stephenson-Moe CA
,Behers BJ
,Gibons RM
,Behers BM
,Jesus Herrera L
,Anneaud D
,Rosario MA
,Wojtas CN
,Bhambrah S
,Hamad KM
... -
《-》
Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model.
To use an artificial intelligence (AI)-powered large language model (LLM) to improve readability of patient handouts.
Review of online material modified by AI.
Academic center.
Five handout materials obtained from the American Rhinologic Society (ARS) and the American Academy of Facial Plastic and Reconstructive Surgery websites were assessed using validated readability metrics. The handouts were inputted into OpenAI's ChatGPT-4 after prompting: "Rewrite the following at a 6th-grade reading level." The understandability and actionability of both native and LLM-revised versions were evaluated using the Patient Education Materials Assessment Tool (PEMAT). Results were compared using Wilcoxon rank-sum tests.
The mean readability scores of the standard (ARS, American Academy of Facial Plastic and Reconstructive Surgery) materials corresponded to "difficult," with reading categories ranging between high school and university grade levels. Conversely, the LLM-revised handouts had an average seventh-grade reading level. LLM-revised handouts had better readability in nearly all metrics tested: Flesch-Kincaid Reading Ease (70.8 vs 43.9; P < .05), Gunning Fog Score (10.2 vs 14.42; P < .05), Simple Measure of Gobbledygook (9.9 vs 13.1; P < .05), Coleman-Liau (8.8 vs 12.6; P < .05), and Automated Readability Index (8.2 vs 10.7; P = .06). PEMAT scores were significantly higher in the LLM-revised handouts for understandability (91 vs 74%; P < .05) with similar actionability (42 vs 34%; P = .15) when compared to the standard materials.
Patient-facing handouts can be augmented by ChatGPT with simple prompting to tailor information with improved readability. This study demonstrates the utility of LLMs to aid in rewriting patient handouts and may serve as a tool to help optimize education materials.
Level VI.
Swisher AR
,Wu AW
,Liu GC
,Lee MK
,Carle TR
,Tang DM
... -
《-》