fMRI volume classification using a 3D convolutional neural network robust to shifted and scaled neuronal activations.
Deep-learning methods based on deep neural networks (DNNs) have recently been successfully utilized in the analysis of neuroimaging data. A convolutional neural network (CNN) is a type of DNN that employs a convolution kernel that covers a local area of the input sample and moves across the sample to provide a feature map for the subsequent layers. In our study, we hypothesized that a 3D-CNN model with down-sampling operations such as pooling and/or stride would have the ability to extract robust feature maps from the shifted and scaled neuronal activations in a single functional MRI (fMRI) volume for the classification of task information associated with that volume. Thus, the 3D-CNN model would be able to ameliorate the potential misalignment of neuronal activations and over-/under-activation in local brain regions caused by imperfections in spatial alignment algorithms, confounded by variability in blood-oxygenation-level-dependent (BOLD) responses across sessions and/or subjects. To this end, the fMRI volumes acquired from four sensorimotor tasks (left-hand clenching, right-hand clenching, auditory attention, and visual stimulation) were used as input for our 3D-CNN model to classify task information using a single fMRI volume. The classification performance of the 3D-CNN was systematically evaluated using fMRI volumes obtained from various minimal preprocessing scenarios applied to raw fMRI volumes that excluded spatial normalization to a template and those obtained from full preprocessing that included spatial normalization. Alternative classifier models such as the 1D fully connected DNN (1D-fcDNN) and support vector machine (SVM) were also used for comparison. The classification performance was also assessed for several k-fold cross-validation (CV) schemes, including leave-one-subject-out CV (LOOCV). Overall, the classification results of the 3D-CNN model were superior to that of the 1D-fcDNN and SVM models. When using the fully-processed fMRI volumes with LOOCV, the mean error rates (± the standard error of the mean) for the 3D-CNN, 1D-fcDNN, and SVM models were 2.1% (± 0.9), 3.1% (± 1.2), and 4.1% (± 1.5), respectively (p = 0.041 from a one-way ANOVA). The error rates for 3-fold CV were higher (2.4% ± 1.0, 4.2% ± 1.3, and 10.1% ± 2.0; p < 0.0003 from a one-way ANOVA). The mean error rates also increased considerably using the raw fMRI 3D volume data without preprocessing (26.2% for the 3D-CNN, 75.0% for the 1D-fcDNN, and 75.0% for the SVM). Furthermore, the ability of the pre-trained 3D-CNN model to handle shifted and scaled neuronal activations was demonstrated in an online scenario for five-class classification (i.e., four sensorimotor tasks and the resting state) using the real-time fMRI of three participants. The resulting classification accuracy was 78.5% (± 1.4), 26.7% (± 5.9), and 21.5% (± 3.1) for the 3D-CNN, 1D-fcDNN, and SVM models, respectively. The superior performance of the 3D-CNN compared to the 1D-fcDNN was verified by analyzing the resulting feature maps and convolution filters that handled the shifted and scaled neuronal activations and by utilizing an independent public dataset from the Human Connectome Project.
Vu H
,Kim HC
,Jung M
,Lee JH
... -
《-》
Task-specific feature extraction and classification of fMRI volumes using a deep neural network initialized with a deep belief network: Evaluation using sensorimotor tasks.
Feedforward deep neural networks (DNNs), artificial neural networks with multiple hidden layers, have recently demonstrated a record-breaking performance in multiple areas of applications in computer vision and speech processing. Following the success, DNNs have been applied to neuroimaging modalities including functional/structural magnetic resonance imaging (MRI) and positron-emission tomography data. However, no study has explicitly applied DNNs to 3D whole-brain fMRI volumes and thereby extracted hidden volumetric representations of fMRI that are discriminative for a task performed as the fMRI volume was acquired. Our study applied fully connected feedforward DNN to fMRI volumes collected in four sensorimotor tasks (i.e., left-hand clenching, right-hand clenching, auditory attention, and visual stimulus) undertaken by 12 healthy participants. Using a leave-one-subject-out cross-validation scheme, a restricted Boltzmann machine-based deep belief network was pretrained and used to initialize weights of the DNN. The pretrained DNN was fine-tuned while systematically controlling weight-sparsity levels across hidden layers. Optimal weight-sparsity levels were determined from a minimum validation error rate of fMRI volume classification. Minimum error rates (mean±standard deviation; %) of 6.9 (±3.8) were obtained from the three-layer DNN with the sparsest condition of weights across the three hidden layers. These error rates were even lower than the error rates from the single-layer network (9.4±4.6) and the two-layer network (7.4±4.1). The estimated DNN weights showed spatial patterns that are remarkably task-specific, particularly in the higher layers. The output values of the third hidden layer represented distinct patterns/codes of the 3D whole-brain fMRI volume and encoded the information of the tasks as evaluated from representational similarity analysis. Our reported findings show the ability of the DNN to classify a single fMRI volume based on the extraction of hidden representations of fMRI volumes associated with tasks across multiple hidden layers. Our study may be beneficial to the automatic classification/diagnosis of neuropsychiatric and neurological diseases and prediction of disease severity and recovery in (pre-) clinical settings using fMRI volumes without requiring an estimation of activation patterns or ad hoc statistical evaluation.
Jang H
,Plis SM
,Calhoun VD
,Lee JH
... -
《-》
A Multichannel 2D Convolutional Neural Network Model for Task-Evoked fMRI Data Classification.
Deep learning models have been successfully applied to the analysis of various functional MRI data. Convolutional neural networks (CNN), a class of deep neural networks, have been found to excel at extracting local meaningful features based on their shared-weights architecture and space invariance characteristics. In this study, we propose M2D CNN, a novel multichannel 2D CNN model, to classify 3D fMRI data. The model uses sliced 2D fMRI data as input and integrates multichannel information learned from 2D CNN networks. We experimentally compared the proposed M2D CNN against several widely used models including SVM, 1D CNN, 2D CNN, 3D CNN, and 3D separable CNN with respect to their performance in classifying task-based fMRI data. We tested M2D CNN against six models as benchmarks to classify a large number of time-series whole-brain imaging data based on a motor task in the Human Connectome Project (HCP). The results of our experiments demonstrate the following: (i) convolution operations in the CNN models are advantageous for high-dimensional whole-brain imaging data classification, as all CNN models outperform SVM; (ii) 3D CNN models achieve higher accuracy than 2D CNN and 1D CNN model, but 3D CNN models are computationally costly as any extra dimension is added in the input; (iii) the M2D CNN model proposed in this study achieves the highest accuracy and alleviates data overfitting given its smaller number of parameters as compared with 3D CNN.
Hu J
,Kuang Y
,Liao B
,Cao L
,Dong S
,Li P
... -
《-》
Classification of schizophrenia and normal controls using 3D convolutional neural network and outcome visualization.
The recent deep learning-based studies on the classification of schizophrenia (SCZ) using MRI data rely on manual extraction of feature vector, which destroys the 3D structure of MRI data. In order to both identify SCZ and find relevant biomarkers, preserving the 3D structure in classification pipeline is critical.
The present study investigated whether the proposed 3D convolutional neural network (CNN) model produces higher accuracy compared to the support vector machine (SVM) and other 3D-CNN models in distinguishing individuals with SCZ spectrum disorders (SSDs) from healthy controls. We sought to construct saliency map using class saliency visualization (CSV) method.
Task-based fMRI data were obtained from 103 patients with SSDs and 41 normal controls. To preserve spatial locality, we used 3D activation map as input for the 3D convolutional autoencoder (3D-CAE)-based CNN model. Data on 62 patients with SSDs were used for unsupervised pretraining with 3D-CAE. Data on the remaining 41 patients and 41 normal controls were processed for training and testing with CNN. The performance of our model was analyzed and compared with SVM and other 3D-CNN models. The learned CNN model was visualized using CSV method.
Using task-based fMRI data, our model achieved 84.15%∼84.43% classification accuracies, outperforming SVM and other 3D-CNN models. The inferior and middle temporal lobes were identified as key regions for classification.
Our findings suggest that the proposed 3D-CAE-based CNN can classify patients with SSDs and controls with higher accuracy compared to other models. Visualization of salient regions provides important clinical information.
Oh K
,Kim W
,Shen G
,Piao Y
,Kang NI
,Oh IS
,Chung YC
... -
《-》
Deep neural network predicts emotional responses of the human brain from functional magnetic resonance imaging.
An artificial neural network with multiple hidden layers (known as a deep neural network, or DNN) was employed as a predictive model (DNNp) for the first time to predict emotional responses using whole-brain functional magnetic resonance imaging (fMRI) data from individual subjects. During fMRI data acquisition, 10 healthy participants listened to 80 International Affective Digital Sound stimuli and rated their own emotions generated by each sound stimulus in terms of the arousal, dominance, and valence dimensions. The whole-brain spatial patterns from a general linear model (i.e., beta-valued maps) for each sound stimulus and the emotional response ratings were used as the input and output for the DNNP, respectively. Based on a nested five-fold cross-validation scheme, the paired input and output data were divided into training (three-fold), validation (one-fold), and test (one-fold) data. The DNNP was trained and optimized using the training and validation data and was tested using the test data. The Pearson's correlation coefficients between the rated and predicted emotional responses from our DNNP model with weight sparsity optimization (mean ± standard error 0.52 ± 0.02 for arousal, 0.51 ± 0.03 for dominance, and 0.51 ± 0.03 for valence, with an input denoising level of 0.3 and a mini-batch size of 1) were significantly greater than those of DNN models with conventional regularization schemes including elastic net regularization (0.15 ± 0.05, 0.15 ± 0.06, and 0.21 ± 0.04 for arousal, dominance, and valence, respectively), those of shallow models including logistic regression (0.11 ± 0.04, 0.10 ± 0.05, and 0.17 ± 0.04 for arousal, dominance, and valence, respectively; average of logistic regression and sparse logistic regression), and those of support vector machine-based predictive models (SVMps; 0.12 ± 0.06, 0.06 ± 0.06, and 0.10 ± 0.06 for arousal, dominance, and valence, respectively; average of linear and non-linear SVMps). This difference was confirmed to be significant with a Bonferroni-corrected p-value of less than 0.001 from a one-way analysis of variance (ANOVA) and subsequent paired t-test. The weights of the trained DNNPs were interpreted and input patterns that maximized or minimized the output of the DNNPs (i.e., the emotional responses) were estimated. Based on a binary classification of each emotion category (e.g., high arousal vs. low arousal), the error rates for the DNNP (31.2% ± 1.3% for arousal, 29.0% ± 1.7% for dominance, and 28.6% ± 3.0% for valence) were significantly lower than those for the linear SVMP (44.7% ± 2.0%, 50.7% ± 1.7%, and 47.4% ± 1.9% for arousal, dominance, and valence, respectively) and the non-linear SVMP (48.8% ± 2.3%, 52.2% ± 1.9%, and 46.4% ± 1.3% for arousal, dominance, and valence, respectively), as confirmed by the Bonferroni-corrected p < 0.001 from the one-way ANOVA. Our study demonstrates that the DNNp model is able to reveal neuronal circuitry associated with human emotional processing - including structures in the limbic and paralimbic areas, which include the amygdala, prefrontal areas, anterior cingulate cortex, insula, and caudate. Our DNNp model was also able to use activation patterns in these structures to predict and classify emotional responses to stimuli.
Kim HC
,Bandettini PA
,Lee JH
《-》