Uniform Manifold Approximation and Projection (UMAP)

统一流形逼近和投影 (UMAP)
  • 文章类型: Journal Article
    The dysfunction of α and β cells in pancreatic islets can lead to diabetes. Many questions remain on the subcellular organization of islet cells during the progression of disease. Existing three-dimensional cellular mapping approaches face challenges such as time-intensive sample sectioning and subjective cellular identification. To address these challenges, we have developed a subcellular feature-based classification approach, which allows us to identify α and β cells and quantify their subcellular structural characteristics using soft X-ray tomography (SXT). We observed significant differences in whole-cell morphological and organelle statistics between the two cell types. Additionally, we characterize subtle biophysical differences between individual insulin and glucagon vesicles by analyzing vesicle size and molecular density distributions, which were not previously possible using other methods. These sub-vesicular parameters enable us to predict cell types systematically using supervised machine learning. We also visualize distinct vesicle and cell subtypes using Uniform Manifold Approximation and Projection (UMAP) embeddings, which provides us with an innovative approach to explore structural heterogeneity in islet cells. This methodology presents an innovative approach for tracking biologically meaningful heterogeneity in cells that can be applied to any cellular system.






  • 文章类型: Journal Article
    Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable \"long call\" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.






  • 文章类型: Journal Article
    Due to extreme conditions, which are influenced by the location of landfills, the release of pollutants has been recently proven to be more severe in estuary landfills, as these landfill locations are affected by both sea-water and river-water interactions. To identify geographic and environmental features linked to the extreme conditions of certain landfills, a high-dimensional clustering method combining Uniform Manifold Approximation and Projection (UMAP) with the Louvain algorithm is proposed. A case study was conducted using 17 noteworthy features that transform to Landfill Suitability Index (LSI) applied to hundreds of landfill sites in Taiwan. This study clustered landfills into 10 clusters and identified several clusters with significant extreme locations, including estuary landfills (7.9 %), fault-water-body landfills (8.2 %), and densely-populated-water-body landfills (17.6 %). Furthermore, a critical discovery of endangered Platalea minor habitats near these estuary landfills was made. Additionally, this work identified \"healthy\" landfills (11.2 %) that are minimally affected by the considered features. These findings demonstrate the promising potential of our framework for managers to systematically improve landfill management strategies. Moreover, our framework was tested by incorporating rainfall and flooding features in relation to climate change scenarios. To address the demand for land release from occupied landfills in Taiwan, there is a pressing need to expedite the transition to a circular economy, and our framework can provide further assistance in this regard. This approach is promising, as it provides a new method to evaluate the environmental risks linked to landfills and also identifies potential opportunities related to landfill mining. Finally, this work was extended to include a case study in England, which has 19,801 landfills and a dataset containing 15 relevant landfill features; in this case study, our framework identified 110 landfill clusters, and several placed in extreme locations, demonstrating that our framework is flexible for use in other regions outside of Taiwan.






  • 文章类型: Journal Article
    Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)-namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies-are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.






  • 文章类型: Journal Article
    Many scientific researchers\' study focuses on enhancing automated systems to identify emotions and thus relies on brain signals. This study focuses on how brain wave signals can be used to classify many emotional states of humans. Electroencephalography (EEG)-based affective computing predominantly focuses on emotion classification based on facial expression, speech recognition, and text-based recognition through multimodality stimuli. The proposed work aims to implement a methodology to identify and codify discrete complex emotions such as pleasure and grief in a rare psychological disorder known as alexithymia. This type of disorder is highly elicited in unstable, fragile countries such as South Sudan, Lebanon, and Mauritius. These countries are continuously affected by civil wars and disaster and politically unstable, leading to a very poor economy and education system. This study focuses on an adolescent age group dataset by recording physiological data when emotion is exhibited in a multimodal virtual environment. We decocted time frequency analysis and amplitude time series correlates including frontal alpha symmetry using a complex Morlet wavelet. For data visualization, we used the UMAP technique to obtain a clear district view of emotions. We performed 5-fold cross validation along with 1 s window subjective classification on the dataset. We opted for traditional machine learning techniques to identify complex emotion labeling.






  • 文章类型: Journal Article
    The rampant increase in drug-resistant tuberculosis (TB) remains a major challenge not only for treatment management but also for diagnosis, as well as drug design and development. Drug-resistant mycobacteria affect the quality of life owing to the delayed diagnosis and require prolonged treatment with multiple and toxic drugs. The phenotypic modulations defining the immune status of an individual during tuberculosis are well established. The present study aims to explore the phenotypic changes of monocytes & dendritic cells (DC) as well as their subsets across the TB disease spectrum, from latency to drug-sensitive TB (DS-TB) and drug-resistant TB (DR-TB) using traditional immunophenotypic analysis and by uniform manifold approximation and projection (UMAP) analysis. Our results demonstrate changes in frequencies of monocytes (classical, CD14++CD16-, intermediate, CD14++CD16+ and non-classical, CD14+/-CD16++) and dendritic cells (DC) (HLA-DR+CD11c+ myeloid DCs, cross-presenting HLA-DR+CD14-CD141+ myeloid DCs and HLA-DR+CD14-CD16-CD11c-CD123+ plasmacytoid DCs) together with elevated Monocyte to Lymphocyte ratios (MLR)/Neutrophil to Lymphocyte ratios (NLR) and alteration of cytokine levels between DS-TB and DR-TB groups. UMAP analysis revealed significant differential expression of CD14+, CD16+, CD86+ and CD64+ on monocytes and CD123+ on DCs by the DR-TB group. Thus, our study reveals differential monocyte and DC subset frequencies among the various TB disease groups towards modulating the immune responses and will be helpful to understand the pathogenicity driven by Mycobacterium tuberculosis.






  • 文章类型: Journal Article
    Studies on the interactions between SARS-CoV-2 and humoral immunity are fundamental to elaborate effective therapies including vaccines. We used polychromatic flow cytometry, coupled with unsupervised data analysis and principal component analysis (PCA), to interrogate B cells in untreated patients with COVID-19 pneumonia. COVID-19 patients displayed normal plasma levels of the main immunoglobulin classes, of antibodies against common antigens or against antigens present in common vaccines. However, we found a decreased number of total and naïve B cells, along with decreased percentages and numbers of memory switched and unswitched B cells. On the contrary, IgM+ and IgM- plasmablasts were significantly increased. In vitro cell activation revealed that B lymphocytes showed a normal proliferation index and number of dividing cells per cycle. PCA indicated that B-cell number, naive and memory B cells but not plasmablasts clustered with patients who were discharged, while plasma IgM level, C-reactive protein, D-dimer, and SOFA score with those who died. In patients with pneumonia, the derangement of the B-cell compartment could be one of the causes of the immunological failure to control SARS-Cov2, have a relevant influence on several pathways, organs and systems, and must be considered to develop vaccine strategies.





