Uniform Manifold Approximation and Projection (UMAP)

统一流形逼近和投影 (UMAP)
  • 文章类型: Journal Article
    胰岛中α和β细胞的功能障碍可导致糖尿病。在疾病进展过程中,胰岛细胞的亚细胞组织仍然存在许多问题。现有的三维细胞映射方法面临诸如时间密集的样品切片和主观细胞识别的挑战。为了应对这些挑战,我们开发了一种基于亚细胞特征的分类方法,这使我们能够使用软X射线断层扫描(SXT)识别α和β细胞并量化其亚细胞结构特征。我们观察到两种细胞类型之间的全细胞形态和细胞器统计存在显着差异。此外,我们通过分析囊泡大小和分子密度分布来表征单个胰岛素和胰高血糖素囊泡之间的细微生物物理差异,这在以前使用其他方法是不可能的。这些亚囊泡参数使我们能够使用监督机器学习系统地预测细胞类型。我们还使用均匀流形近似和投影(UMAP)嵌入可视化不同的囊泡和细胞亚型,这为我们提供了一种探索胰岛细胞结构异质性的创新方法。该方法提出了一种用于跟踪细胞中生物学上有意义的异质性的创新方法,可应用于任何细胞系统。
    The dysfunction of α and β cells in pancreatic islets can lead to diabetes. Many questions remain on the subcellular organization of islet cells during the progression of disease. Existing three-dimensional cellular mapping approaches face challenges such as time-intensive sample sectioning and subjective cellular identification. To address these challenges, we have developed a subcellular feature-based classification approach, which allows us to identify α and β cells and quantify their subcellular structural characteristics using soft X-ray tomography (SXT). We observed significant differences in whole-cell morphological and organelle statistics between the two cell types. Additionally, we characterize subtle biophysical differences between individual insulin and glucagon vesicles by analyzing vesicle size and molecular density distributions, which were not previously possible using other methods. These sub-vesicular parameters enable us to predict cell types systematically using supervised machine learning. We also visualize distinct vesicle and cell subtypes using Uniform Manifold Approximation and Projection (UMAP) embeddings, which provides us with an innovative approach to explore structural heterogeneity in islet cells. This methodology presents an innovative approach for tracking biologically meaningful heterogeneity in cells that can be applied to any cellular system.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    声乐复杂性是许多关于动物交流的进化假设的核心。然而,量化和比较复杂性仍然是一个挑战,特别是当声乐类型高度分级时。雄性婆罗洲猩猩(Pongopygmaeuswurmbii)会产生复杂而可变的“长叫声”发声,其中包括多种声音类型,这些声音类型在个体内部和个体之间各不相同。先前的研究描述了这些复杂发声中的六种不同的呼叫(或脉冲)类型,但是没有人量化它们的离散性或人类观察者对它们进行可靠分类的能力。我们研究了13个人的长电话:(1)评估和量化三个训练有素的观察者的视听分类的可靠性,(2)使用监督分类和无监督聚类区分调用类型,(3)比较不同特征集的性能。使用46个声学特征,我们使用了机器学习(即,支持向量机,亲和繁殖,和模糊c均值)来识别呼叫类型并评估其离散性。我们还使用均匀流形近似和投影(UMAP)使用提取的特征和频谱图表示来可视化脉冲的分离。监督方法显示观察者间可靠性低,分类精度差,表明脉冲类型不是离散的。我们提出了一种更新的脉冲分类方法,该方法在观察者之间具有很高的可重复性,并且使用支持向量机具有很强的分类准确性。尽管呼叫类型的数量较少表明长呼叫相当简单,声音的连续渐变似乎大大提升了这个系统的复杂性。这项工作响应了进行更多定量研究以定义呼叫类型并量化动物声乐系统中的分级性的呼吁,并强调了需要一个更全面的框架来研究相对于分级曲目的声乐复杂性。
    Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable \"long call\" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    由于极端条件,受垃圾填埋场位置的影响,最近证明,河口垃圾填埋场污染物的释放更加严重,因为这些垃圾填埋场受到海水和河水相互作用的影响。为了确定与某些垃圾填埋场的极端条件有关的地理和环境特征,提出了一种将均匀流形逼近和投影(UMAP)与Louvain算法相结合的高维聚类方法。使用17个值得注意的功能进行了案例研究,这些功能转变为应用于台湾数百个垃圾填埋场的垃圾填埋场适宜性指数(LSI)。这项研究将垃圾填埋场分为10个集群,并确定了几个具有明显极端位置的集群,包括河口堆填区(7.9%),断层水体填埋场(8.2%),和人口稠密的水体填埋场(17.6%)。此外,在这些河口垃圾填埋场附近发现了濒临灭绝的Platalea次要栖息地。此外,这项工作确定了受所考虑特征影响最小的“健康”垃圾填埋场(11.2%)。这些发现证明了我们的框架对于管理者系统地改善垃圾填埋场管理策略的潜力。此外,通过结合与气候变化情景相关的降雨和洪水特征来测试我们的框架。为解决台湾被占堆填区释放土地的需求,迫切需要加快向循环经济的过渡,我们的框架可以在这方面提供进一步的援助。这种方法很有希望,因为它提供了一种新的方法来评估与堆填区相关的环境风险,并确定了与堆填区采矿相关的潜在机会。最后,这项工作被扩展到包括英国的一个案例研究,它有19,801个垃圾填埋场和一个包含15个相关垃圾填埋场特征的数据集;在这个案例研究中,我们的框架确定了110个垃圾填埋场集群,还有几个放在极端的地方,证明我们的框架在台湾以外的其他地区使用是灵活的。
    Due to extreme conditions, which are influenced by the location of landfills, the release of pollutants has been recently proven to be more severe in estuary landfills, as these landfill locations are affected by both sea-water and river-water interactions. To identify geographic and environmental features linked to the extreme conditions of certain landfills, a high-dimensional clustering method combining Uniform Manifold Approximation and Projection (UMAP) with the Louvain algorithm is proposed. A case study was conducted using 17 noteworthy features that transform to Landfill Suitability Index (LSI) applied to hundreds of landfill sites in Taiwan. This study clustered landfills into 10 clusters and identified several clusters with significant extreme locations, including estuary landfills (7.9 %), fault-water-body landfills (8.2 %), and densely-populated-water-body landfills (17.6 %). Furthermore, a critical discovery of endangered Platalea minor habitats near these estuary landfills was made. Additionally, this work identified \"healthy\" landfills (11.2 %) that are minimally affected by the considered features. These findings demonstrate the promising potential of our framework for managers to systematically improve landfill management strategies. Moreover, our framework was tested by incorporating rainfall and flooding features in relation to climate change scenarios. To address the demand for land release from occupied landfills in Taiwan, there is a pressing need to expedite the transition to a circular economy, and our framework can provide further assistance in this regard. This approach is promising, as it provides a new method to evaluate the environmental risks linked to landfills and also identifies potential opportunities related to landfill mining. Finally, this work was extended to include a case study in England, which has 19,801 landfills and a dataset containing 15 relevant landfill features; in this case study, our framework identified 110 landfill clusters, and several placed in extreme locations, demonstrating that our framework is flexible for use in other regions outside of Taiwan.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    用功能性MRI(fMRI)测量的全脑功能连通性(FC)随着时间的推移在时间尺度上以有意义的方式演变(例如,开发)到秒[例如,扫描内时变FC(tvFC)]。然而,我们探索tvFC的能力受到其大维度(几千)的严重限制。为了克服这个困难,研究人员经常寻求生成低维表示(例如,2D和3D散点图)希望这些将保留数据的重要方面(例如,与行为和疾病进展的关系)。有限的先前经验工作表明,流形学习技术(MLT)-即那些寻求推断低维非线性表面的技术(即,流形)大多数数据所在的地方-是完成这项任务的好人选。在这里,我们详细探讨这种可能性。首先,我们讨论了为什么应该期望tvFC数据位于低维流形上。第二,我们估计什么是内在维度(ID;即,tvFC数据流形的最小潜在维数)。第三,我们描述了三种最先进的MLT的内部工作原理:拉普拉斯特征映射(LE),T分布随机邻居嵌入(T-SNE),和统一流形逼近和投影(UMAP)。对于每种方法,我们根据经验评估其生成tvFC数据的神经生物学有意义表示的能力,以及它们对超参数选择的鲁棒性。我们的结果表明,tvFC数据的ID范围在4到26之间,并且该ID在休息和任务状态之间显着变化。我们还展示了所有三种方法如何有效地捕获正在执行的主题身份和任务:UMAP和T-SNE可以同时捕获这两个级别的细节,但是LE一次只能捕获一个。我们观察到MLT的嵌入质量有很大差异,和MLT内作为超参数选择的函数。为了缓解这个问题,我们提供了可以为未来研究提供信息的启发式方法。最后,我们还证明了在合并跨受试者的数据时特征归一化的重要性,以及时间自相关在将MLT应用于tvFC数据中所起的作用.总的来说,我们得出的结论是,虽然MLT可以用于生成带标签的tvFC数据的摘要视图,它们对未标记数据(如静息状态)的应用仍然具有挑战性。
    Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)-namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies-are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    许多科学研究人员的研究重点是增强自动系统来识别情绪,从而依赖于大脑信号。这项研究的重点是如何使用脑电波信号对人类的许多情绪状态进行分类。基于脑电图(EEG)的情感计算主要集中在基于面部表情的情感分类上,语音识别,以及通过多模态刺激进行基于文本的识别。拟议的工作旨在实施一种方法,以识别和编纂离散的复杂情绪,例如在一种称为述情障碍的罕见心理障碍中的快乐和悲伤。这种类型的疾病在不稳定的情况下高度诱发,像南苏丹这样脆弱的国家,黎巴嫩,毛里求斯。这些国家不断受到内战和灾难的影响,政局不稳,导致非常糟糕的经济和教育体系。这项研究通过记录在多模态虚拟环境中表现出情绪时的生理数据来关注青少年年龄组数据集。我们使用复杂的Morlet小波对时频分析和振幅时间序列进行了相关性分析,包括额叶alpha对称性。对于数据可视化,我们使用UMAP技术来获得一个清晰的情绪区域视图。我们对数据集进行了5倍交叉验证和1s窗口主观分类。我们选择了传统的机器学习技术来识别复杂的情绪标签。
    Many scientific researchers\' study focuses on enhancing automated systems to identify emotions and thus relies on brain signals. This study focuses on how brain wave signals can be used to classify many emotional states of humans. Electroencephalography (EEG)-based affective computing predominantly focuses on emotion classification based on facial expression, speech recognition, and text-based recognition through multimodality stimuli. The proposed work aims to implement a methodology to identify and codify discrete complex emotions such as pleasure and grief in a rare psychological disorder known as alexithymia. This type of disorder is highly elicited in unstable, fragile countries such as South Sudan, Lebanon, and Mauritius. These countries are continuously affected by civil wars and disaster and politically unstable, leading to a very poor economy and education system. This study focuses on an adolescent age group dataset by recording physiological data when emotion is exhibited in a multimodal virtual environment. We decocted time frequency analysis and amplitude time series correlates including frontal alpha symmetry using a complex Morlet wavelet. For data visualization, we used the UMAP technique to obtain a clear district view of emotions. We performed 5-fold cross validation along with 1 s window subjective classification on the dataset. We opted for traditional machine learning techniques to identify complex emotion labeling.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    耐药结核病(TB)的猖獗增长不仅对治疗管理,而且对诊断也是一个重大挑战。以及药物设计和开发。由于诊断延迟,耐药性分枝杆菌会影响生活质量,并且需要使用多种有毒药物进行长期治疗。确定结核病期间个体免疫状态的表型调节是很好的。本研究旨在探讨单核细胞和树突状细胞(DC)及其亚群在结核病疾病谱中的表型变化,从潜伏期到药物敏感性结核病(DS-TB)和耐药结核病(DR-TB),使用传统的免疫表型分析和统一流形近似和投影(UMAP)分析。我们的结果表明单核细胞频率的变化(经典,CD14++CD16-,中间,CD14++CD16+和非经典,CD14+/-CD16++)和树突状细胞(DC)(HLA-DR+CD11c+髓样DC,交叉呈递HLA-DRCD14-CD141髓样DC和HLA-DRCD14-CD16-CD11c-CD123浆细胞样DC),单核细胞与淋巴细胞比率(MLR)/中性粒细胞与淋巴细胞比率(NLR)升高以及DS-TB和DR-TB组之间细胞因子水平的变化。UMAP分析显示CD14+的显著差异表达,CD16+,DR-TB组单核细胞上的CD86+和CD64+和DC上的CD123+。因此,我们的研究揭示了不同结核病组的单核细胞和DC亚群在调节免疫反应方面的差异,这将有助于了解结核分枝杆菌驱动的致病性.
    The rampant increase in drug-resistant tuberculosis (TB) remains a major challenge not only for treatment management but also for diagnosis, as well as drug design and development. Drug-resistant mycobacteria affect the quality of life owing to the delayed diagnosis and require prolonged treatment with multiple and toxic drugs. The phenotypic modulations defining the immune status of an individual during tuberculosis are well established. The present study aims to explore the phenotypic changes of monocytes & dendritic cells (DC) as well as their subsets across the TB disease spectrum, from latency to drug-sensitive TB (DS-TB) and drug-resistant TB (DR-TB) using traditional immunophenotypic analysis and by uniform manifold approximation and projection (UMAP) analysis. Our results demonstrate changes in frequencies of monocytes (classical, CD14++CD16-, intermediate, CD14++CD16+ and non-classical, CD14+/-CD16++) and dendritic cells (DC) (HLA-DR+CD11c+ myeloid DCs, cross-presenting HLA-DR+CD14-CD141+ myeloid DCs and HLA-DR+CD14-CD16-CD11c-CD123+ plasmacytoid DCs) together with elevated Monocyte to Lymphocyte ratios (MLR)/Neutrophil to Lymphocyte ratios (NLR) and alteration of cytokine levels between DS-TB and DR-TB groups. UMAP analysis revealed significant differential expression of CD14+, CD16+, CD86+ and CD64+ on monocytes and CD123+ on DCs by the DR-TB group. Thus, our study reveals differential monocyte and DC subset frequencies among the various TB disease groups towards modulating the immune responses and will be helpful to understand the pathogenicity driven by Mycobacterium tuberculosis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Studies on the interactions between SARS-CoV-2 and humoral immunity are fundamental to elaborate effective therapies including vaccines. We used polychromatic flow cytometry, coupled with unsupervised data analysis and principal component analysis (PCA), to interrogate B cells in untreated patients with COVID-19 pneumonia. COVID-19 patients displayed normal plasma levels of the main immunoglobulin classes, of antibodies against common antigens or against antigens present in common vaccines. However, we found a decreased number of total and naïve B cells, along with decreased percentages and numbers of memory switched and unswitched B cells. On the contrary, IgM+ and IgM- plasmablasts were significantly increased. In vitro cell activation revealed that B lymphocytes showed a normal proliferation index and number of dividing cells per cycle. PCA indicated that B-cell number, naive and memory B cells but not plasmablasts clustered with patients who were discharged, while plasma IgM level, C-reactive protein, D-dimer, and SOFA score with those who died. In patients with pneumonia, the derangement of the B-cell compartment could be one of the causes of the immunological failure to control SARS-Cov2, have a relevant influence on several pathways, organs and systems, and must be considered to develop vaccine strategies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号