STGNNks: Identifying cell types in spatial transcriptomics data based on graph neural network, denoising auto-encoder, and k-sums clustering.
Spatial transcriptomics technologies fully utilize spatial location information, tissue morphological features, and transcriptional profiles. Integrating these data can greatly advance our understanding about cell biology in the morphological background.
We developed an innovative spatial clustering method called STGNNks by combining graph neural network, denoising auto-encoder, and k-sums clustering. First, spatial resolved transcriptomics data are preprocessed and a hybrid adjacency matrix is constructed. Next, gene expressions and spatial context are integrated to learn spots' embedding features by a deep graph infomax-based graph convolutional network. Third, the learned features are mapped to a low-dimensional space through a zero-inflated negative binomial (ZINB)-based denoising auto-encoder. Fourth, a k-sums clustering algorithm is developed to identify spatial domains by combining k-means clustering and the ratio-cut clustering algorithms. Finally, it implements spatial trajectory inference, spatially variable gene identification, and differentially expressed gene detection based on the pseudo-space-time method on six 10x Genomics Visium datasets.
We compared our proposed STGNNks method with five other spatial clustering methods, CCST, Seurat, stLearn, Scanpy and SEDR. For the first time, four internal indicators in the area of machine learning, that is, silhouette coefficient, the Davies-Bouldin index, the Caliniski-Harabasz index, and the S_Dbw index, were used to measure the clustering performance of STGNNks with CCST, Seurat, stLearn, Scanpy and SEDR on five spatial transcriptomics datasets without labels (i.e., Adult Mouse Brain (FFPE), Adult Mouse Kidney (FFPE), Human Breast Cancer (Block A Section 2), Human Breast Cancer (FFPE), and Human Lymph Node). And two external indicators including adjusted Rand index (ARI) and normalized mutual information (NMI) were applied to evaluate the performance of the above six methods on Human Breast Cancer (Block A Section 1) with real labels. The comparison experiments elucidated that STGNNks obtained the smallest Davies-Bouldin and S_Dbw values and the largest Silhouette Coefficient, Caliniski-Harabasz, ARI and NMI, significantly outperforming the above five spatial transcriptomics analysis algorithms. Furthermore, we detected the top six spatially variable genes and the top five differentially expressed genes in each cluster on the above five unlabeled datasets. And the pseudo-space-time tree plot with hierarchical layout demonstrated a flow of Human Breast Cancer (Block A Section 1) progress in three clades branching from three invasive ductal carcinoma regions to multiple ductal carcinoma in situ sub-clusters.
We anticipate that STGNNks can efficiently improve spatial transcriptomics data analysis and further boost the diagnosis and therapy of related diseases. The codes are publicly available at https://github.com/plhhnu/STGNNks.
Peng L
,He X
,Peng X
,Li Z
,Zhang L
... -
《-》
Integrating multi-modal information to detect spatial domains of spatial transcriptomics by graph attention network.
Recent advances in spatially resolved transcriptomic technologies have enabled unprecedented opportunities to elucidate tissue architecture and function in situ. Spatial transcriptomics can provide multimodal and complementary information simultaneously, including gene expression profiles, spatial locations, and histology images. However, most existing methods have limitations in efficiently utilizing spatial information and matched high-resolution histology images. To fully leverage the multi-modal information, we propose a SPAtially embedded Deep Attentional graph Clustering (SpaDAC) method to identify spatial domains while reconstructing denoised gene expression profiles. This method can efficiently learn the low-dimensional embeddings for spatial transcriptomics data by constructing multi-view graph modules to capture both spatial location connectives and morphological connectives. Benchmark results demonstrate that SpaDAC outperforms other algorithms on several recent spatial transcriptomics datasets. SpaDAC is a valuable tool for spatial domain detection, facilitating the comprehension of tissue architecture and cellular microenvironment. The source code of SpaDAC is freely available at Github (https://github.com/huoyuying/SpaDAC.git).
Huo Y
,Guo Y
,Wang J
,Xue H
,Feng Y
,Chen W
,Li X
... -
《Journal of Genetics and Genomics》