Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey.
Artificial intelligence (AI) has become a popular tool for clinical and research use in the medical field. The aim of this study was to evaluate the accuracy and reliability of a generative AI tool on pediatric familial Mediterranean fever (FMF).
Fifteen questions repeated thrice on pediatric FMF were prompted to the popular generative AI tool Microsoft Copilot with Chat-GPT 4.0. Nine pediatric rheumatology experts rated response accuracy with a blinded mechanism using a Likert-like scale with values from 1 to 5.
Median values for overall responses at the initial assessment ranged from 2.00 to 5.00. During the second assessment, median values spanned from 2.00 to 4.00, while for the third assessment, they ranged from 3.00 to 4.00. Intra-rater variability showed poor to moderate agreement (intraclass correlation coefficient range: -0.151 to 0.534). A diminishing level of agreement among experts over time was documented, as highlighted by Krippendorff's alpha coefficient values, ranging from 0.136 (at the first response) to 0.132 (at the second response) to 0.089 (at the third response). Lastly, experts displayed varying levels of trust in AI pre- and post-survey.
AI has promising implications in pediatric rheumatology, including early diagnosis and management optimization, but challenges persist due to uncertain information reliability and the lack of expert validation. Our survey revealed considerable inaccuracies and incompleteness in AI-generated responses regarding FMF, with poor intra- and extra-rater reliability. Human validation remains crucial in managing AI-generated medical information.
La Bella S
,Attanasi M
,Porreca A
,Di Ludovico A
,Maggio MC
,Gallizzi R
,La Torre F
,Rigante D
,Soscia F
,Ardenti Morini F
,Insalaco A
,Natale MF
,Chiarelli F
,Simonini G
,De Benedetti F
,Gattorno M
,Breda L
... -
《Pediatric Rheumatology》
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.
The COVID-19 pandemic has significantly strained health care systems globally, leading to an overwhelming influx of patients and exacerbating resource limitations. Concurrently, an "infodemic" of misinformation, particularly prevalent in women's health, has emerged. This challenge has been pivotal for health care providers, especially gynecologists and obstetricians, in managing pregnant women's health. The pandemic heightened risks for pregnant women from COVID-19, necessitating balanced advice from specialists on vaccine safety versus known risks. In addition, the advent of generative artificial intelligence (AI), such as large language models (LLMs), offers promising support in health care. However, they necessitate rigorous testing.
This study aimed to assess LLMs' proficiency, clarity, and objectivity regarding COVID-19's impacts on pregnancy.
This study evaluates 4 major AI prototypes (ChatGPT-3.5, ChatGPT-4, Microsoft Copilot, and Google Bard) using zero-shot prompts in a questionnaire validated among 159 Israeli gynecologists and obstetricians. The questionnaire assesses proficiency in providing accurate information on COVID-19 in relation to pregnancy. Text-mining, sentiment analysis, and readability (Flesch-Kincaid grade level and Flesch Reading Ease Score) were also conducted.
In terms of LLMs' knowledge, ChatGPT-4 and Microsoft Copilot each scored 97% (32/33), Google Bard 94% (31/33), and ChatGPT-3.5 82% (27/33). ChatGPT-4 incorrectly stated an increased risk of miscarriage due to COVID-19. Google Bard and Microsoft Copilot had minor inaccuracies concerning COVID-19 transmission and complications. In the sentiment analysis, Microsoft Copilot achieved the least negative score (-4), followed by ChatGPT-4 (-6) and Google Bard (-7), while ChatGPT-3.5 obtained the most negative score (-12). Finally, concerning the readability analysis, Flesch-Kincaid Grade Level and Flesch Reading Ease Score showed that Microsoft Copilot was the most accessible at 9.9 and 49, followed by ChatGPT-4 at 12.4 and 37.1, while ChatGPT-3.5 (12.9 and 35.6) and Google Bard (12.9 and 35.8) generated particularly complex responses.
The study highlights varying knowledge levels of LLMs in relation to COVID-19 and pregnancy. ChatGPT-3.5 showed the least knowledge and alignment with scientific evidence. Readability and complexity analyses suggest that each AI's approach was tailored to specific audiences, with ChatGPT versions being more suitable for specialized readers and Microsoft Copilot for the general public. Sentiment analysis revealed notable variations in the way LLMs communicated critical information, underscoring the essential role of neutral and objective health care communication in ensuring that pregnant women, particularly vulnerable during the COVID-19 pandemic, receive accurate and reassuring guidance. Overall, ChatGPT-4, Microsoft Copilot, and Google Bard generally provided accurate, updated information on COVID-19 and vaccines in maternal and fetal health, aligning with health guidelines. The study demonstrated the potential role of AI in supplementing health care knowledge, with a need for continuous updating and verification of AI knowledge bases. The choice of AI tool should consider the target audience and required information detail level.
Bragazzi NL
,Buchinger M
,Atwan H
,Tuma R
,Chirico F
,Szarpak L
,Farah R
,Khamisy-Farah R
... -
《-》
Evaluating ChatGPT to test its robustness as an interactive information database of radiation oncology and to assess its responses to common queries from radiotherapy patients: A single institution investigation.
Commercial vendors have created artificial intelligence (AI) tools for use in all aspects of life and medicine, including radiation oncology. AI innovations will likely disrupt workflows in the field of radiation oncology. However, limited data exist on using AI-based chatbots about the quality of radiation oncology information. This study aims to assess the accuracy of ChatGPT, an AI-based chatbot, in answering patients' questions during their first visit to the radiation oncology outpatient department and test knowledge of ChatGPT in radiation oncology.
Expert opinion was formulated using a set of ten standard questions of patients encountered in outpatient department practice. A blinded expert opinion was taken for the ten questions on common queries of patients in outpatient department visits, and the same questions were evaluated on ChatGPT version 3.5 (ChatGPT 3.5). The answers by expert and ChatGPT were independently evaluated for accuracy by three scientific reviewers. Additionally, a comparison was made for the extent of similarity of answers between ChatGPT and experts by a response scoring for each answer. Word count and Flesch-Kincaid readability score and grade were done for the responses obtained from expert and ChatGPT. A comparison of the answers of ChatGPT and expert was done with a Likert scale. As a second component of the study, we tested the technical knowledge of ChatGPT. Ten multiple choice questions were framed with increasing order of difficulty - basic, intermediate and advanced, and the responses were evaluated on ChatGPT. Statistical testing was done using SPSS version 27.
After expert review, the accuracy of expert opinion was 100%, and ChatGPT's was 80% (8/10) for regular questions encountered in outpatient department visits. A noticeable difference was observed in word count and readability of answers from expert opinion or ChatGPT. Of the ten multiple-choice questions for assessment of radiation oncology database, ChatGPT had an accuracy rate of 90% (9 out of 10). One answer to a basic-level question was incorrect, whereas all answers to intermediate and difficult-level questions were correct.
ChatGPT provides reasonably accurate information about routine questions encountered in the first outpatient department visit of the patient and also demonstrated a sound knowledge of the subject. The result of our study can inform the future development of educational tools in radiation oncology and may have implications in other medical fields. This is the first study that provides essential insight into the potentially positive capabilities of two components of ChatGPT: firstly, ChatGPT's response to common queries of patients at OPD visits, and secondly, the assessment of the radiation oncology knowledge base of ChatGPT.
Pandey VK
,Munshi A
,Mohanti BK
,Bansal K
,Rastogi K
... -
《-》
RETRACTED: Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial.
Chloroquine and hydroxychloroquine have been found to be efficient on SARS-CoV-2, and reported to be efficient in Chinese COV-19 patients. We evaluate the effect of hydroxychloroquine on respiratory viral loads.
French Confirmed COVID-19 patients were included in a single arm protocol from early March to March 16th, to receive 600mg of hydroxychloroquine daily and their viral load in nasopharyngeal swabs was tested daily in a hospital setting. Depending on their clinical presentation, azithromycin was added to the treatment. Untreated patients from another center and cases refusing the protocol were included as negative controls. Presence and absence of virus at Day6-post inclusion was considered the end point.
Six patients were asymptomatic, 22 had upper respiratory tract infection symptoms and eight had lower respiratory tract infection symptoms. Twenty cases were treated in this study and showed a significant reduction of the viral carriage at D6-post inclusion compared to controls, and much lower average carrying duration than reported in the litterature for untreated patients. Azithromycin added to hydroxychloroquine was significantly more efficient for virus elimination.
Despite its small sample size, our survey shows that hydroxychloroquine treatment is significantly associated with viral load reduction/disappearance in COVID-19 patients and its effect is reinforced by azithromycin.
This article has been retracted: please see Elsevier Policy on Article Withdrawal (https://www.elsevier.com/locate/withdrawalpolicy). Concerns have been raised regarding this article, the substance of which relate to the articles' adherence to Elsevier's publishing ethics policies and the appropriate conduct of research involving human participants, as well as concerns raised by three of the authors themselves regarding the article's methodology and conclusions. Elsevier's Research Integrity and Publishing Ethics Team, in collaboration with the journal's co-owner, the International Society of Antimicrobial Chemotherapy (ISAC), and with guidance from an impartial field expert acting in the role of an independent Publishing Ethics Advisor, Dr. Jim Gray, Consultant Microbiologist at the Birmingham Children's and Women's Hospitals, U.K., conducted an investigation and determined that the below points constituted cause for retraction: • The journal has been unable to confirm whether any of the patients for this study were accrued before ethical approval had been obtained. The ethical approval dates for this article are stated as being 5th and 6th of March 2020 (ANSM and CPP respectively), while the article states that recruitment began in “early March”. The 17th author, Prof. Philippe Brouqui, has confirmed that the start date for patient accrual was 6th March 2020. The journal has not been able to establish whether all patients could have entered into the study in time for the data to have been analysed and included in the manuscript prior to its submission on the 20th March 2020, nor whether all patients were enrolled in the study upon admission as opposed to having been hospitalised for some time before starting the treatment described in the article. Additionally, the journal has not been able to establish whether there was equipoise between the study patients and the control patients. • The journal has not been able to establish whether the subjects in this study should have provided informed consent to receive azithromycin as part of the study. The journal has concluded that that there is reasonable cause to conclude that azithromycin was not considered standard care at the time of the study. The 17th author, Prof. Philippe Brouqui has attested that azithromycin treatment was not, at the time of the study, an experimental treatment but a possible treatment for, or preventative measure against, bacterial superinfections of viral pneumonia as described in section 2.4 of the article, and as such the treatment should be categorised as standard care that would not require informed consent. This does not fully address the journal's concerns around the use of azithromycin in the study. In section 3.1 of the article, it is stated that six patients received azithromycin to prevent (rather than treat) bacterial superinfection. All of these were amongst the patients who also received hydroxychloroquine (HCQ). None of the control patients are reported to have received azithromycin. This would indicate that only patients in the HCQ arm received azithromycin, all of whom were in one center. The recommendations for use of macrolides in France at the time the study was conducted indicate that azithromycin would not have been a logical agent to use as first-line prophylaxis against pneumonia due to the frequency of macrolide resistance amongst bacteria such as pneumococci. These two points suggest that azithromycin would not have been standard practice across southern France at the time the study was conducted and would have required informed consent. • Three of the authors of this article, Dr. Johan Courjon, Prof. Valérie Giordanengo, and Dr. Stéphane Honoré have contacted the journal to assert their opinion that they have concerns regarding the presentation and interpretation of results in this article and have stated they no longer wish to see their names associated with the article. • Author Prof. Valérie Giordanengo informed the journal that while the PCR tests administered in Nice were interpreted according to the recommendations of the national reference center, it is believed that those carried out in Marseille were not conducted using the same technique or not interpreted according to the same recommendations, which in her opinion would have resulted in a bias in the analysis of the data. This raises concerns as to whether the study was partially conducted counter to national guidelines at that time. The 17th author, Prof. Philippe Brouqui has attested that the PCR methodology was explained in reference 17 of the article. However, the article referred to by reference 17 describes several diagnostic approaches that were used (one PCR targeting the envelope protein only; another targeting the spike protein; and three commercially produced systems by QuantiNova, Biofire, and FTD). This reference does not clarify how the results were interpreted. It has also been noted during investigation of these concerns that only 76% (19/25) of patients were viral culture positive, resulting in uncertainty in the interpretation of PCR reports as has been raised by Prof. Giordanengo. As part of the investigation, the corresponding author was contacted and asked to provide an explanation for the above concerns. No response has been received within the deadline provided by the journal. Responses were received by the 3rd and 17th authors, Prof. Philippe Parola and Prof. Philippe Brouqui, respectively, and were reviewed as part of the investigation. These two authors, in addition to 1st author Dr. Philippe Gautret, 13th author Prof. Philippe Colson, and 15th author Prof. Bernard La Scola, disagreed with the retraction and dispute the grounds for it. Having followed due process and concluded the aforementioned investigation and based on the recommendation of Dr. Jim Gray acting in his capacity as independent Publishing Ethics Advisor, the co-owners of the journal (Elsevier and ISAC) have therefore taken the decision to retract the article.
Gautret P
,Lagier JC
,Parola P
,Hoang VT
,Meddeb L
,Mailhe M
,Doudier B
,Courjon J
,Giordanengo V
,Vieira VE
,Tissot Dupont H
,Honoré S
,Colson P
,Chabrière E
,La Scola B
,Rolain JM
,Brouqui P
,Raoult D
... -
《-》