t - SNE
  • 文章类型: Journal Article
    Measuring the chemical composition in soybeans is time-consuming and laborious, and even simple near-infrared sensors generally require the creation of calibration curves before application. In this study, a new screening method for soybeans without calibration curves was investigated by combining the excitation emission matrix (EEM) and dimensionality reduction analysis. The EEMs of 34 soybean samples were measured, and representative chemical contents including crude protein, crude oil and isoflavone contents were measured by chemical analysis. Two methods of dimensionality reduction: principal component analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) were applied on the EEM data to obtain two-dimensional plots, which were divided into two regions with large or small amount of each chemical components. To classify the large or small levels of each of the chemical composition, machine learning classification models were constructed on the two-dimensional plots after dimensionality reduction. As a result, the classification accuracy was higher in t-SNE than in the combinations of PC1 and PC2 from PCA. Furthermore, in t-SNE, the classification accuracy reached over 90% for all the chemical components. From these results, t-SNE dimensionality reduction on the soybean EEM has the potential for easy and accurate screening of soybeans especially based on isoflavone contents.






  • 文章类型: Journal Article
    Dendrobium, a highly effective traditional Chinese medicinal herb, exhibits significant variations in efficacy and price among different varieties. Therefore, achieving an efficient classification of Dendrobium is crucial. However, most of the existing identification methods for Dendrobium make it difficult to simultaneously achieve both non-destructiveness and high efficiency, making it challenging to truly meet the needs of industrial production. In this study, we combined Laser-Induced Breakdown Spectroscopy (LIBS) with multivariate models to classify 10 varieties of Dendrobium. LIBS spectral data for each Dendrobium variety were collected from three circular medicinal blocks. During the data analysis phase, multivariate models to classify different Dendrobium varieties first preprocess the LIBS spectral data using Gaussian filtering and stacked correlation coefficient feature selection. Subsequently, the constructed fusion model is utilized for classification. The results demonstrate that the classification accuracy of 10 Dendrobium varieties reached 100%. Compared to Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN), our method improved classification accuracy by 14%, 20%, and 20%, respectively. Additionally, it outperforms three models (SVM, RF, and KNN) with added Principal Component Analysis (PCA) by 10%, 10%, and 17%. This fully validates the excellent performance of our classification method. Finally, visualization analysis of the entire research process based on t-distributed Stochastic Neighbor Embedding (t-SNE) technology further enhances the interpretability of the model. This study, by combining LIBS and machine learning technologies, achieves efficient classification of Dendrobium, providing a feasible solution for the identification of Dendrobium and even traditional Chinese medicinal herbs.






  • 文章类型: Journal Article
    UNASSIGNED: Several pelvic area cancers exhibit high incidence rates, and their surgical treatment can result in adverse effects such as urinary and fecal incontinence, significantly impacting patients\' quality of life. Post-surgery incontinence is a significant concern, with prevalence rates ranging from 25 to 45% for urinary incontinence and 9-68% for fecal incontinence. Cancer survivors are increasingly turning to YouTube as a platform to connect with others, yet caution is warranted as misinformation is prevalent.
    UNASSIGNED: This study aims to evaluate the information quality in YouTube videos about post-surgical incontinence after pelvic area cancer surgery.
    UNASSIGNED: A YouTube search for \"Incontinence after cancer surgery\" yielded 108 videos, which were subsequently analyzed. To evaluate these videos, several quality assessment tools were utilized, including DISCERN, GQS, JAMA, PEMAT, and MQ-VET. Statistical analyses, such as descriptive statistics and intercorrelation tests, were employed to assess various video attributes, including characteristics, popularity, educational value, quality, and reliability. Also, artificial intelligence techniques like PCA, t-SNE, and UMAP were used for data analysis. HeatMap and Hierarchical Clustering Dendrogram techniques validated the Machine Learning results.
    UNASSIGNED: The quality scales presented a high level of correlation one with each other (p < 0.01) and the Artificial Intelligence-based techniques presented clear clustering representations of the dataset samples, which were reinforced by the Heat Map and Hierarchical Clustering Dendrogram.
    UNASSIGNED: YouTube videos on \"Incontinence after Cancer Surgery\" present a \"High\" quality across multiple scales. The use of AI tools, like PCA, t-SNE, and UMAP, is highlighted for clustering large health datasets, improving data visualization, pattern recognition, and complex healthcare analysis.






  • 文章类型: Journal Article
    Labelling medical images is an arduous and costly task that necessitates clinical expertise and large numbers of qualified images. Insufficient samples can lead to underfitting during training and poor performance of supervised learning models. In this study, we aim to develop a SimCLR-based semi-supervised learning framework to classify colorectal neoplasia based on the NICE classification. First, the proposed framework was trained under self-supervised learning using a large unlabelled dataset; subsequently, it was fine-tuned on a limited labelled dataset based on the NICE classification. The model was evaluated on an independent dataset and compared with models based on supervised transfer learning and endoscopists using accuracy, Matthew\'s correlation coefficient (MCC), and Cohen\'s kappa. Finally, Grad-CAM and t-SNE were applied to visualize the models\' interpretations. A ResNet-backboned SimCLR model (accuracy of 0.908, MCC of 0.862, and Cohen\'s kappa of 0.896) outperformed supervised transfer learning-based models (means: 0.803, 0.698, and 0.742) and junior endoscopists (0.816, 0.724, and 0.863), while performing only slightly worse than senior endoscopists (0.916, 0.875, and 0.944). Moreover, t-SNE showed a better clustering of ternary samples through self-supervised learning in SimCLR than through supervised transfer learning. Compared with traditional supervised learning, semi-supervised learning enables deep learning models to achieve improved performance with limited labelled endoscopic images.






  • 文章类型: Journal Article
    Using multi-color flow cytometry analysis, we studied the immunophenotypical differences between leukemic cells from patients with AML/MDS and hematopoietic stem and progenitor cells (HSPCs) from patients in complete remission (CR) following their successful treatment. The panel of markers included CD34, CD38, CD45RA, CD123 as representatives for a hierarchical hematopoietic stem and progenitor cell (HSPC) classification as well as programmed death ligand 1 (PD-L1). Rather than restricting the evaluation on a 2- or 3-dimensional analysis, we applied a t-distributed stochastic neighbor embedding (t-SNE) approach to obtain deeper insight and segregation between leukemic cells and normal HPSCs. For that purpose, we created a t-SNE map, which resulted in the visualization of 27 cell clusters based on their similarity concerning the composition and intensity of antigen expression. Two of these clusters were \"leukemia-related\" containing a great proportion of CD34+/CD38- hematopoietic stem cells (HSCs) or CD34+ cells with a strong co-expression of CD45RA/CD123, respectively. CD34+ cells within the latter cluster were also highly positive for PD-L1 reflecting their immunosuppressive capacity. Beyond this proof of principle study, the inclusion of additional markers will be helpful to refine the differentiation between normal HSPCs and leukemic cells, particularly in the context of minimal disease detection and antigen-targeted therapeutic interventions. Furthermore, we suggest a protocol for the assignment of new cell ensembles in quantitative terms, via a numerical value, the Pearson coefficient, based on a similarity comparison of the t-SNE pattern with a reference.






  • 文章类型: Journal Article
    Water distribution networks (WDNs) experience significant water loss due to leaks, necessitating advanced water leak detection methods. However, machine learning-based acoustic method heavily relies on signal information and is limited by data scarcity and the limited diversity of available data. To address this challenge and enhance water leak detection in WDNs, this study proposes an LSTM-GAN approach. Acoustic signals are collected from WDNs to train the LSTM-GAN model, which generates synthetic leak signals to enhance the dataset. The validity of the generative method is evaluated through t-SNE and acoustic characteristics analysis. LSTM-based water leak detection models are established and compared using the original and the generated datasets to confirm the efficacy of generated samples in improving water leak detection performances. The capability of LSTM-GAN has been evaluated through different perspectives, including sensitivity analysis and model comparison. The results validate the quality and consistency of the generated acoustic signals under leak conditions. Besides, the optimal number of generated samples should be determined according to the requirements and characteristics of the leak detection task. Furthermore, the comparison between the proposed method and other acoustic generative methods demonstrates the superiority of LSTM-GAN-generated signals in enhancing the performance of leak detection models. The proposed generative method offers an innovative approach to facilitate machine learning-based leak detection models with limited data, thereby enhancing robustness.






  • 文章类型: Journal Article
    Classifying big data in hyperspectral imaging (HSI) can be challenging when minor (low-concentrated) compounds are present in actual samples, as for chemical additives and adulterants in food matrix. Herein, we propose a new strategy to classify HSI data for the identification of adulterants in food material for the first time. This strategy is based on the selection of essential spectral pixels of full HSI data followed by the feature space construction using uniform manifold approximation and projection as well as the data clustering utilizing hierarchical clustering analysis on the reduced data (named ESPs-UMAP-HCA). We apply our approach to analyze two real NIR datasets and four new Raman datasets. Compared with non-ESPs UMAP-HCA and t-distributed stochastic neighbor embedding combined with ESPs and HCA (ESPs-t-SNE-HCA), the developed strategy provides well-separated clusters for major and minor compounds in food matrix. Finally, the adulterants as minor compounds are accurately identified, which is confirmed by the fact that the extracted spectra of them perfectly match with their pure spectra. In addition, their locations are found in the contribution map even though they are present in a few pixels. What\'s more, the proposed strategy does not need any a priori knowledge of the data structure and the class memberships and therefore reduced the studied difficulty and confirmation bias in the analysis of big HSI datasets. Overall, the proposed ESPs-UMAP-HCA method could be a potential approach for food adulteration detection.






  • 文章类型: Journal Article
    Characterizing the connectomic and morphological diversity of thalamic neurons is key for better understanding how the thalamus relays sensory inputs to the cortex. The recent public release of complete single-neuron morphological reconstructions enables the analysis of previously inaccessible connectivity patterns from individual neurons. Here we focus on the Ventral Posteromedial (VPM) nucleus and characterize the full diversity of 257 VPM neurons, obtained by combining data from the MouseLight and Braintell projects. Neurons were clustered according to their most dominantly targeted cortical area and further subdivided by their jointly targeted areas. We obtained a 2D embedding of morphological diversity using the dissimilarity between all pairs of axonal trees. The curved shape of the embedding allowed us to characterize neurons by a 1-dimensional coordinate. The coordinate values were aligned both with the progression of soma position along the dorsal-ventral and lateral-medial axes and with that of axonal terminals along the posterior-anterior and medial-lateral axes, as well as with an increase in the number of branching points, distance from soma and branching width. Taken together, we have developed a novel workflow for linking three challenging aspects of connectomics, namely the topography, higher order connectivity patterns and morphological diversity, with VPM as a test-case. The workflow is linked to a unified access portal that contains the morphologies and integrated with 2D cortical flatmap and subcortical visualization tools. The workflow and resulting processed data have been made available in Python, and can thus be used for modeling and experimentally validating new hypotheses on thalamocortical connectivity.






  • 文章类型: Journal Article
    Emotion recognition (ER) plays a crucial role in enabling machines to perceive human emotional and psychological states, thus enhancing human-machine interaction. Recently, there has been a growing interest in ER based on electroencephalogram (EEG) signals. However, due to the noisy, nonlinear, and nonstationary properties of electroencephalography signals, developing an automatic and high-accuracy ER system is still a challenging task. In this study, a pretrained deep residual convolutional neural network model, including 17 convolutional layers and one fully connected layer with transfer learning technique in combination frequency-channel matrices (FCM) of two-dimensional data based on Welch power spectral density estimate from the one-dimensional EEG data has been proposed for improving the ER by automatically learning the underlying intrinsic features of multi-channel EEG data. The experiment result shows a mean accuracy of 93.61 ± 0.84%, a mean precision of 94.70 ± 0.60%, a mean sensitivity of 95.13 ± 1.02%, a mean specificity of 91.04 ± 1.02%, and a mean F1-score of 94.91 ± 0.68%, respectively using 5-fold cross-validation on the DEAP dataset. Meanwhile, to better explore and understand how the proposed model works, we noted that the ranking of clustering effect of FCM for the same category by employing the t-distributed stochastic neighbor embedding strategy is: softmax layer activation is the best, the middle convolutional layer activation is the second, and the early max pooling layer activation is the worst. These findings confirm the promising potential of combining deep learning approaches with transfer learning techniques and FCM for effective ER tasks.






  • 文章类型: Journal Article
    Falls by the elderly pose considerable health hazards, leading not only to physical harm but a number of other related problems. A timely alert about a deteriorating gait, as an indication of an impending fall, can assist in fall prevention. In this investigation, a comprehensive comparative analysis was conducted between a commercially available mobile phone system and two wristband systems: one commercially available and another representing a novel approach. Each system was equipped with a singular three-axis accelerometer. The walk suggestive of a potential fall was induced by special glasses worn by the participants. The same standard machine-learning techniques were employed for the classification with all three systems based on a single three-axis accelerometer, yielding a best average accuracy of 86%, a specificity of 88%, and a sensitivity of 86% via the support vector machine (SVM) method using a wristband. A smartphone, on the other hand, achieved a best average accuracy of 73% also with an SVM using only a three-axis accelerometer sensor. The significance analysis of the mean accuracy, sensitivity, and specificity between the innovative wristband and the smartphone yielded a p-value of 0.000. Furthermore, the study applied unsupervised and semi-supervised learning methods, incorporating principal component analysis and t-distributed stochastic neighbor embedding. To sum up, both wristbands demonstrated the usability of wearable sensors in the early detection and mitigation of falls in the elderly, outperforming the smartphone.





