Prediction and the influencing factor study of colorectal cancer hospitalization costs in China based on machine learning-random forest and support vector regression: a retrospective study.
As people's standard of living improves, the incidence of colorectal cancer is increasing, and colorectal cancer hospitalization costs are relatively high. Therefore, predicting the cost of hospitalization for colorectal cancer patients can provide guidance for controlling healthcare costs and for the development of related policies.
This study used the first page of medical record data on colorectal cancer inpatient cases of a tertiary first-class hospital in Shenzhen from 2018 to 2022. The impacting factors of hospitalization costs for colorectal cancer were analyzed. Random forest and support vector regression models were used to establish predictive models of the cost of hospitalization for colorectal cancer patients and to compare and evaluate.
In colorectal cancer inpatients, major procedures, length of stay, level of procedure, Charlson comorbidity index, age, and medical payment method were the important influencing factors. In terms of the test set, the R2 of the Random forest model was 0.833, the R2 of the Support vector regression model was 0.824; the root mean square error (RMSE) of the Random forest model was 0.029, and the RMSE of the Support vector regression model was 0.032. In the Random Forest model, the weight of the major procedure was the highest (0.286).
Major procedures and length of stay have the greatest impacts on hospital costs for colorectal cancer patients. The random forest model is a better method to predict the hospitalization costs for colorectal cancer patients than the support vector regression.
Gao J
,Liu Y
《Frontiers in Public Health》
Retrospective Study on the Influencing Factors and Prediction of Hospitalization Expenses for Chronic Renal Failure in China Based on Random Forest and LASSO Regression.
Aim: With the improvement in people's living standards, the incidence of chronic renal failure (CRF) is increasing annually. The increase in the number of patients with CRF has significantly increased pressure on China's medical budget. Predicting hospitalization expenses for CRF can provide guidance for effective allocation and control of medical costs. The purpose of this study was to use the random forest (RF) method and least absolute shrinkage and selection operator (LASSO) regression to predict personal hospitalization expenses of hospitalized patients with CRF and to evaluate related influencing factors. Methods: The data set was collected from the first page of data of the medical records of three tertiary first-class hospitals for the whole year of 2016. Factors influencing hospitalization expenses for CRF were analyzed. Random forest and least absolute shrinkage and selection operator regression models were used to establish a prediction model for the hospitalization expenses of patients with CRF, and comparisons and evaluations were carried out. Results: For CRF inpatients, statistically significant differences in hospitalization expenses were found for major procedures, medical payment method, hospitalization frequency, length of stay, number of other diagnoses, and number of procedures. The R2 of LASSO regression model and RF regression model are 0.6992 and 0.7946, respectively. The mean absolute error (MAE) and root mean square error (RMSE) of the LASSO regression model were 0.0268 and 0.043, respectively, and the MAE and RMSE of the RF prediction model were 0.0171 and 0.0355, respectively. In the RF model, and the weight of length of stay was the highest (0.730). Conclusions: The hospitalization expenses of patients with CRF are most affected by length of stay. The RF prediction model is superior to the LASSO regression model and can be used to predict the hospitalization expenses of patients with CRF. Health administration departments may consider formulating accurate individualized hospitalization expense reimbursement mechanisms accordingly.
Dai P
,Chang W
,Xin Z
,Cheng H
,Ouyang W
,Luo A
... -
《Frontiers in Public Health》
Development of a System for Predicting Hospitalization Time for Patients With Traumatic Brain Injury Based on Machine Learning Algorithms: User-Centered Design Case Study.
Currently, the treatment and care of patients with traumatic brain injury (TBI) are intractable health problems worldwide and greatly increase the medical burden in society. However, machine learning-based algorithms and the use of a large amount of data accumulated in the clinic in the past can predict the hospitalization time of patients with brain injury in advance, so as to design a reasonable arrangement of resources and effectively reduce the medical burden of society. Especially in China, where medical resources are so tight, this method has important application value.
We aimed to develop a system based on a machine learning model for predicting the length of hospitalization of patients with TBI, which is available to patients, nurses, and physicians.
We collected information on 1128 patients who received treatment at the Neurosurgery Center of the Second Affiliated Hospital of Anhui Medical University from May 2017 to May 2022, and we trained and tested the machine learning model using 5 cross-validations to avoid overfitting; 28 types of independent variables were used as input variables in the machine learning model, and the length of hospitalization was used as the output variables. Once the models were trained, we obtained the error and goodness of fit (R2) of each machine learning model from the 5 rounds of cross-validation and compared them to select the best predictive model to be encapsulated in the developed system. In addition, we externally tested the models using clinical data related to patients treated at the First Affiliated Hospital of Anhui Medical University from June 2021 to February 2022.
Six machine learning models were built, including support vector regression machine, convolutional neural network, back propagation neural network, random forest, logistic regression, and multilayer perceptron. Among them, the support vector regression has the smallest error of 10.22% on the test set, the highest goodness of fit of 90.4%, and all performances are the best among the 6 models. In addition, we used external datasets to verify the experimental results of these 6 models in order to avoid experimental chance, and the support vector regression machine eventually performed the best in the external datasets. Therefore, we chose to encapsulate the support vector regression machine into our system for predicting the length of stay of patients with traumatic brain trauma. Finally, we made the developed system available to patients, nurses, and physicians, and the satisfaction questionnaire showed that patients, nurses, and physicians agreed that the system was effective in providing clinical decisions to help patients, nurses, and physicians.
This study shows that the support vector regression machine model developed using machine learning methods can accurately predict the length of hospitalization of patients with TBI, and the developed prediction system has strong clinical use.
Zhou H
,Fang C
,Pan Y
《-》
Predicting length of stay and mortality among hospitalized patients with type 2 diabetes mellitus and hypertension.
Type 2 diabetes mellitus (T2DM) and hypertension (HTN), both non-communicable diseases, are leading causes of death globally, with more imbalances in lower middle-income countries. Furthermore, poor treatment and management are known to lead to intensified healthcare utilization and increased medical care costs and impose a significant societal burden, in these countries, including Indonesia. Predicting future clinical outcomes can determine the line of treatment and value of healthcare costs, while ensuring effective patient care. In this paper, we present the prediction of length of stay (LoS) and mortality among hospitalized patients at a tertiary referral hospital in Tasikmalaya, Indonesia, between 2016 and 2019. We also aimed to determine how socio-demographic characteristics, and T2DM- or HTN-related comorbidities affect inpatient LoS and mortality.
We analyzed insurance claims data of 4376 patients with T2DM or HTN hospitalized in the referral hospital. We used four prediction models based on machine-learning algorithms for LoS prediction, in relation to disease severity, physician-in-charge, room type, co-morbidities, and types of procedures performed. We used five classifiers based on multilayer perceptron (MLP) to predict inpatient mortality and compared them according to training time, testing time, and Area under Receiver Operative Curve (AUROC). Classifier accuracy measures, which included positive predictive value (PPV), negative predictive value (NPV), F-Measure, and recall, were used as performance evaluation methods.
A Random forest best predicted inpatient LoS (R2, 0.70; root mean square error [RMSE], 1.96; mean absolute error [MAE], 0.935), and the gradient boosting regression model also performed similarly (R2, 0.69; RMSE, 1.96; MAE, 0.935). For inpatient mortality, best results were observed using MLP with back propagation (AUROC 0.899; 69.33 and 98.61 for PPV and NPV, respectively). The other classifiers, stochastic gradient descent with regression loss function, Huber, and random forest models all showed an average performance.
Linear regression model best predicted LoS and mortality was best predicted using MLP. Patients with primary diseases such as T2DM or HTN may have comorbidities that can prolong inpatient LoS. Physicians play an important role in disseminating health related information. These predictions could assist in the development of health policies and strategies that reduce disease burden in resource-limited settings.
Barsasella D
,Gupta S
,Malwade S
,Aminin
,Susanti Y
,Tirmadi B
,Mutamakin A
,Jonnagaddala J
,Syed-Abdul S
... -
《-》