
自引率: 7.7%
被引量: 796
通过率: 暂无数据
审稿周期: 暂无数据
版面费用: 暂无数据
国人发稿量: 10
-
Identifying heterogeneous subgroups of systemic autoimmune diseases by applying a joint dimension reduction and clustering approach to immunomarkers.
被引量:- 发表:1970
-
Knowledge-slanted random forest method for high-dimensional data and small sample size with a feature selection application for gene expression data.
The use of prior knowledge in the machine learning framework has been considered a potential tool to handle the curse of dimensionality in genetic and genomics data. Although random forest (RF) represents a flexible non-parametric approach with several advantages, it can provide poor accuracy in high-dimensional settings, mainly in scenarios with small sample sizes. We propose a knowledge-slanted RF that integrates biological networks as prior knowledge into the model to improve its performance and explainability, exemplifying its use for selecting and identifying relevant genes. knowledge-slanted RF is a combination of two stages. First, prior knowledge represented by graphs is translated by running a random walk with restart algorithm to determine the relevance of each gene based on its connection and localization on a protein-protein interaction network. Then, each relevance is used to modify the selection probability to draw a gene as a candidate split-feature in the conventional RF. Experiments in simulated datasets with very small sample sizes ( n ≤ 30 ) comparing knowledge-slanted RF against conventional RF and logistic lasso regression, suggest an improved precision in outcome prediction compared to the other methods. The knowledge-slanted RF was completed with the introduction of a modified version of the Boruta feature selection algorithm. Finally, knowledge-slanted RF identified more relevant biological genes, offering a higher level of explainability for users than conventional RF. These findings were corroborated in one real case to identify relevant genes to calcific aortic valve stenosis.
被引量:- 发表:1970
-
Enhanced labor pain monitoring using machine learning and ECG waveform analysis for uterine contraction-induced pain.
This study aims to develop an innovative approach for monitoring and assessing labor pain through ECG waveform analysis, utilizing machine learning techniques to monitor pain resulting from uterine contractions. The study was conducted at National Taiwan University Hospital between January and July 2020. We collected a dataset of 6010 ECG samples from women preparing for natural spontaneous delivery (NSD). The ECG data was used to develop an ECG waveform-based Nociception Monitoring Index (NoM). The dataset was divided into training (80%) and validation (20%) sets. Multiple machine learning models, including LightGBM, XGBoost, SnapLogisticRegression, and SnapDecisionTree, were developed and evaluated. Hyperparameter optimization was performed using grid search and five-fold cross-validation to enhance model performance. The LightGBM model demonstrated superior performance with an AUC of 0.96 and an accuracy of 90%, making it the optimal model for monitoring labor pain based on ECG data. Other models, such as XGBoost and SnapLogisticRegression, also showed strong performance, with AUC values ranging from 0.88 to 0.95. This study demonstrates that the integration of machine learning algorithms with ECG data significantly enhances the accuracy and reliability of labor pain monitoring. Specifically, the LightGBM model exhibits exceptional precision and robustness in continuous pain monitoring during labor, with potential applicability extending to broader healthcare settings. ClinicalTrials.gov Identifier: NCT04461704.
被引量:- 发表:1970
-
The goldmine of GWAS summary statistics: a systematic review of methods and tools.
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
被引量:- 发表:1970
-
QIGTD: identifying critical genes in the evolution of lung adenocarcinoma with tensor decomposition.
Identifying critical genes is important for understanding the pathogenesis of complex diseases. Traditional studies typically comparing the change of biomecules between normal and disease samples or detecting important vertices from a single static biomolecular network, which often overlook the dynamic changes that occur between different disease stages. However, investigating temporal changes in biomolecular networks and identifying critical genes is critical for understanding the occurrence and development of diseases. A novel method called Quantifying Importance of Genes with Tensor Decomposition (QIGTD) was proposed in this study. It first constructs a time series network by integrating both the intra and inter temporal network information, which preserving connections between networks at adjacent stages according to the local similarities. A tensor is employed to describe the connections of this time series network, and a 3-order tensor decomposition method was proposed to capture both the topological information of each network snapshot and the time series characteristics of the whole network. QIGTD is also a learning-free and efficient method that can be applied to datasets with a small number of samples. The effectiveness of QIGTD was evaluated using lung adenocarcinoma (LUAD) datasets and three state-of-the-art methods: T-degree, T-closeness, and T-betweenness were employed as benchmark methods. Numerical experimental results demonstrate that QIGTD outperforms these methods in terms of the indices of both precision and mAP. Notably, out of the top 50 genes, 29 have been verified to be highly related to LUAD according to the DisGeNET Database, and 36 are significantly enriched in LUAD related Gene Ontology (GO) terms, including nuclear division, mitotic nuclear division, chromosome segregation, organelle fission, and mitotic sister chromatid segregation. In conclusion, QIGTD effectively captures the temporal changes in gene networks and identifies critical genes. It provides a valuable tool for studying temporal dynamics in biological networks and can aid in understanding the underlying mechanisms of diseases such as LUAD.
被引量:- 发表:1970