-
Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.
Patient-generated health data (PGHD) captured via smart devices or digital health technologies can reflect an individual health journey. PGHD enables tracking and monitoring of personal health conditions, symptoms, and medications out of the clinic, which is crucial for self-care and shared clinical decisions. In addition to self-reported measures and structured PGHD (eg, self-screening, sensor-based biometric data), free-text and unstructured PGHD (eg, patient care note, medical diary) can provide a broader view of a patient's journey and health condition. Natural language processing (NLP) is used to process and analyze unstructured data to create meaningful summaries and insights, showing promise to improve the utilization of PGHD.
Our aim is to understand and demonstrate the feasibility of an NLP pipeline to extract medication and symptom information from real-world patient and caregiver data.
We report a secondary data analysis, using a data set collected from 24 parents of children with special health care needs (CSHCN) who were recruited via a nonrandom sampling approach. Participants used a voice-interactive app for 2 weeks, generating free-text patient notes (audio transcription or text entry). We built an NLP pipeline using a zero-shot approach (adaptive to low-resource settings). We used named entity recognition (NER) and medical ontologies (RXNorm and SNOMED CT [Systematized Nomenclature of Medicine Clinical Terms]) to identify medication and symptoms. Sentence-level dependency parse trees and part-of-speech tags were used to extract additional entity information using the syntactic properties of a note. We assessed the data; evaluated the pipeline with the patient notes; and reported the precision, recall, and F1 scores.
In total, 87 patient notes are included (audio transcriptions n=78 and text entries n=9) from 24 parents who have at least one CSHCN. The participants were between the ages of 26 and 59 years. The majority were White (n=22, 92%), had more than one child (n=16, 67%), lived in Ohio (n=22, 92%), had mid- or upper-mid household income (n=15, 62.5%), and had higher level education (n=24, 58%). Out of 87 notes, 30 were drug and medication related, and 46 were symptom related. We captured medication instances (medication, unit, quantity, and date) and symptoms satisfactorily (precision >0.65, recall >0.77, F1>0.72). These results indicate the potential when using NER and dependency parsing through an NLP pipeline on information extraction from unstructured PGHD.
The proposed NLP pipeline was found to be feasible for use with real-world unstructured PGHD to accomplish medication and symptom extraction. Unstructured PGHD can be leveraged to inform clinical decision-making, remote monitoring, and self-care including medical adherence and chronic disease management. With customizable information extraction methods using NER and medical ontologies, NLP models can feasibly extract a broad range of clinical information from unstructured PGHD in low-resource settings (eg, a limited number of patient notes or training data).
Sezgin E
,Hussain SA
,Rust S
,Huang Y
... -
《-》
-
A natural language processing pipeline to synthesize patient-generated notes toward improving remote care and chronic disease management: a cystic fibrosis case study.
Patient-generated health data (PGHD) are important for tracking and monitoring out of clinic health events and supporting shared clinical decisions. Unstructured text as PGHD (eg, medical diary notes and transcriptions) may encapsulate rich information through narratives which can be critical to better understand a patient's condition. We propose a natural language processing (NLP) supported data synthesis pipeline for unstructured PGHD, focusing on children with special healthcare needs (CSHCN), and demonstrate it with a case study on cystic fibrosis (CF).
The proposed unstructured data synthesis and information extraction pipeline extract a broad range of health information by combining rule-based approaches with pretrained deep-learning models. Particularly, we build upon the scispaCy biomedical model suite, leveraging its named entity recognition capabilities to identify and link clinically relevant entities to established ontologies such as Systematized Nomenclature of Medicine (SNOMED) and RXNORM. We then use scispaCy's syntax (grammar) parsing tools to retrieve phrases associated with the entities in medication, dose, therapies, symptoms, bowel movements, and nutrition ontological categories. The pipeline is illustrated and tested with simulated CF patient notes.
The proposed hybrid deep-learning rule-based approach can operate over a variety of natural language note types and allow customization for a given patient or cohort. Viable information was successfully extracted from simulated CF notes. This hybrid pipeline is robust to misspellings and varied word representations and can be tailored to accommodate the needs of a specific patient, cohort, or clinician.
The NLP pipeline can extract predefined or ontology-based entities from free-text PGHD, aiming to facilitate remote care and improve chronic disease management. Our implementation makes use of open source models, allowing for this solution to be easily replicated and integrated in different health systems. Outside of the clinic, the use of the NLP pipeline may increase the amount of clinical data recorded by families of CSHCN and ease the process to identify health events from the notes. Similarly, care coordinators, nurses and clinicians would be able to track adherence with medications, identify symptoms, and effectively intervene to improve clinical care. Furthermore, visualization tools can be applied to digest the structured data produced by the pipeline in support of the decision-making process for a patient, caregiver, or provider.
Our study demonstrated that an NLP pipeline can be used to create an automated analysis and reporting mechanism for unstructured PGHD. Further studies are suggested with real-world data to assess pipeline performance and further implications.
Hussain SA
,Sezgin E
,Krivchenia K
,Luna J
,Rust S
,Huang Y
... -
《-》
-
"Hey Siri, Help Me Take Care of My Child": A Feasibility Study With Caregivers of Children With Special Healthcare Needs Using Voice Interaction and Automatic Speech Recognition in Remote Care Management.
About 23% of households in the United States have at least one child who has special healthcare needs. As most care activities occur at home, there is often a disconnect and lack of communication between families, home care nurses, and healthcare providers. Digital health technologies may help bridge this gap.
We conducted a pre-post study with a voice-enabled medical note taking (diary) app (SpeakHealth) in a real world setting with caregivers (parents, family members) of children with special healthcare needs (CSHCN) to understand feasibility of voice interaction and automatic speech recognition (ASR) for medical note taking at home.
In total, 41 parents of CSHCN were recruited. Participants completed a pre-study survey collecting demographic details, technology and care management preferences. Out of 41, 24 participants completed the study, using the app for 2 weeks and completing an exit survey. The app facilitated caregiver note-taking using voice interaction and ASR. An exit survey was conducted to collect feedback on technology adoption and changes in technology preferences in care management. We assessed the feasibility of the app by descriptively analyzing survey responses and user data following the key focus areas of acceptability, demand, implementation and integration, adaptation and expansion. In addition, perceived effectiveness of the app was assessed by comparing perceived changes in mobile app preferences among participants. In addition, the voice data, notes, and transcriptions were descriptively analyzed for understanding the feasibility of the app.
The majority of the recruited parents were 35-44 years old (22, 53.7%), part of a two-parent household (30, 73.2%), white (37, 90.2%), had more than one child (31, 75.6%), lived in Ohio (37, 90.2%), used mobile health apps, mobile note taking apps or calendar apps (28, 68.3%) and patient portal apps (22, 53.7%) to track symptoms and health events at home. Caregivers had experience with voice technology as well (32, 78%). Among those completed the post-study survey (in Likert Scale 1-5), ~80% of the caregivers agreed or strongly agreed that using the app would enhance their performance in completing tasks (perceived usefulness; mean = 3.4, SD = 0.8), the app is free of effort (perceived ease of use; mean = 3.2, SD = 0.9), and they would use the app in the future (behavioral intention; mean = 3.1, SD = 0.9). In total, 88 voice interactive patient notes were generated with the majority of the voice recordings being less than 20 s in length (66%). Most noted symptoms and conditions, medications, treatment and therapies, and patient behaviors. More than half of the caregivers reported that voice interaction with the app and using transcribed notes positively changed their preference of technology to use and methods for tracking symptoms and health events at home.
Our findings suggested that voice interaction and ASR use in mobile apps are feasible and effective in keeping track of symptoms and health events at home. Future work is suggested toward using integrated and intelligent systems with voice interactions with broader populations.
Sezgin E
,Oiler B
,Abbott B
,Noritz G
,Huang Y
... -
《Frontiers in Public Health》
-
Identification of Preanesthetic History Elements by a Natural Language Processing Engine.
Methods that can automate, support, and streamline the preanesthesia evaluation process may improve resource utilization and efficiency. Natural language processing (NLP) involves the extraction of relevant information from unstructured text data. We describe the utilization of a clinical NLP pipeline intended to identify elements relevant to preoperative medical history by analyzing clinical notes. We hypothesize that the NLP pipeline would identify a significant portion of pertinent history captured by a perioperative provider.
For each patient, we collected all pertinent notes from the institution's electronic medical record that were available no later than 1 day before their preoperative anesthesia clinic appointment. Pertinent notes included free-text notes consisting of history and physical, consultation, outpatient, inpatient progress, and previous preanesthetic evaluation notes. The free-text notes were processed by a Named Entity Recognition pipeline, an NLP machine learning model trained to recognize and label spans of text that corresponded to medical concepts. These medical concepts were then mapped to a list of medical conditions that were of interest for a preanesthesia evaluation. For each condition, we calculated the percentage of time across all patients in which (1) the NLP pipeline and the anesthesiologist both captured the condition; (2) the NLP pipeline captured the condition but the anesthesiologist did not; and (3) the NLP pipeline did not capture the condition but the anesthesiologist did.
A total of 93 patients were included in the NLP pipeline input. Free-text notes were extracted from the electronic medical record of these patients for a total of 9765 notes. The NLP pipeline and anesthesiologist agreed in 81.24% of instances on the presence or absence of a specific condition. The NLP pipeline identified information that was not noted by the anesthesiologist in 16.57% of instances and did not identify a condition that was noted by the anesthesiologist's review in 2.19% of instances.
In this proof-of-concept study, we demonstrated that utilization of NLP produced an output that identified medical conditions relevant to preanesthetic evaluation from unstructured free-text input. Automation of risk stratification tools may provide clinical decision support or recommend additional preoperative testing or evaluation. Future studies are needed to integrate these tools into clinical workflows and validate its efficacy.
Suh HS
,Tully JL
,Meineke MN
,Waterman RS
,Gabriel RA
... -
《-》
-
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.
Merging disparate and heterogeneous datasets from clinical routine in a standardized and semantically enriched format to enable a multiple use of data also means incorporating unstructured data such as medical free texts. Although the extraction of structured data from texts, known as natural language processing (NLP), has been researched at least for the English language extensively, it is not enough to get a structured output in any format. NLP techniques need to be used together with clinical information standards such as openEHR to be able to reuse and exchange still unstructured data sensibly.
The aim of the study is to automatically extract crucial information from medical free texts and to transform this unstructured clinical data into a standardized and structured representation by designing and implementing an exemplary pipeline for the processing of pediatric medical histories.
We constructed a pipeline that allows reusing medical free texts such as pediatric medical histories in a structured and standardized way by (1) selecting and modeling appropriate openEHR archetypes as standard clinical information models, (2) defining a German dictionary with crucial text markers serving as expert knowledge base for a NLP pipeline, and (3) creating mapping rules between the NLP output and the archetypes. The approach was evaluated in a first pilot study by using 50 manually annotated medical histories from the pediatric intensive care unit of the Hannover Medical School.
We successfully reused 24 existing international archetypes to represent the most crucial elements of unstructured pediatric medical histories in a standardized form. The self-developed NLP pipeline was constructed by defining 3.055 text marker entries, 132 text events, 66 regular expressions, and a text corpus consisting of 776 entries for automatic correction of spelling mistakes. A total of 123 mapping rules were implemented to transform the extracted snippets to an openEHR-based representation to be able to store them together with other structured data in an existing openEHR-based data repository. In the first evaluation, the NLP pipeline yielded 97% precision and 94% recall.
The use of NLP and openEHR archetypes was demonstrated as a viable approach for extracting and representing important information from pediatric medical histories in a structured and semantically enriched format. We designed a promising approach with potential to be generalized, and implemented a prototype that is extensible and reusable for other use cases concerning German medical free texts. In a long term, this will harness unstructured clinical data for further research purposes such as the design of clinical decision support systems. Together with structured data already integrated in openEHR-based representations, we aim at developing an interoperable openEHR-based application that is capable of automatically assessing a patient's risk status based on the patient's medical history at time of admission.
Wulff A
,Mast M
,Hassler M
,Montag S
,Marschollek M
,Jack T
... -
《-》