Unsupervised clustering

无监督聚类
  • 文章类型: Journal Article
    背景:宏基因组读数的结合是一个活跃的研究领域,许多基于无监督机器学习的技术已被用于宏基因组读取的分类学独立分级。目的:找到簇的最佳数量以及开发有效的管道来破译微生物基因组的复杂性非常重要。方法:应用无监督聚类技术进行分箱需要事先找到最佳的聚类数量,并且被认为是一项艰巨的任务。本文描述了一种新颖的方法,MetaConClust,使用覆盖信息对重叠群进行分组,并使用基于共识的聚类方法自动找到宏基因组学数据分箱的最佳聚类数量。已观察到宏基因组学样品中重叠群的覆盖率与样品中物种的丰度成正比,并由MetaConClust用于第一阶段的数据分组。在第二阶段中,使用围绕Medoid(PAM)方法进行聚类,以生成具有通过基于共识的方法自动确定的初始聚类数量的bin。结果:最后,使用轮廓指数测试获得的垃圾箱的质量,兰特指数,召回,精度,和准确性。MetaConClust的性能与使用基准低复杂度模拟和真实宏基因组数据集的最新方法和工具进行比较,对于无监督和混合方法具有可比性。结论:这表明基于共识的聚类方法是一种有前途的方法,可以自动找到宏基因组学数据的bin数量。
    Background: Binning of metagenomic reads is an active area of research, and many unsupervised machine learning-based techniques have been used for taxonomic independent binning of metagenomic reads. Objective: It is important to find the optimum number of the cluster as well as develop an efficient pipeline for deciphering the complexity of the microbial genome. Methods: Applying unsupervised clustering techniques for binning requires finding the optimal number of clusters beforehand and is observed to be a difficult task. This paper describes a novel method, MetaConClust, using coverage information for grouping of contigs and automatically finding the optimal number of clusters for binning of metagenomics data using a consensus-based clustering approach. The coverage of contigs in a metagenomics sample has been observed to be directly proportional to the abundance of species in the sample and is used for grouping of data in the first phase by MetaConClust. The Partitioning Around Medoid (PAM) method is used for clustering in the second phase for generating bins with the initial number of clusters determined automatically through a consensus-based method. Results: Finally, the quality of the obtained bins is tested using silhouette index, rand Index, recall, precision, and accuracy. Performance of MetaConClust is compared with recent methods and tools using benchmarked low complexity simulated and real metagenomic datasets and is found better for unsupervised and comparable for hybrid methods. Conclusion: This is suggestive of the proposition that the consensus-based clustering approach is a promising method for automatically finding the number of bins for metagenomics data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    肝硬化是世界上最常见的死亡原因之一。肝硬化的进展涉及健康,肝硬化和肝癌,导致疾病诊断面临巨大挑战。药物靶标,可以方便地获得,可以帮助临床医生改善预后和治疗。肝硬化与血清钙水平有关。研究报道丹参酮IIA通过激活钙依赖性细胞凋亡在肝损伤中起治疗作用。在这项研究中,我们通过探索包括健康在内的全面数据集,探索了丹参酮IIA在肝硬化中的诊断关键靶标,肝硬化和肝癌患者。无监督共识聚类算法鉴定了3种新型亚型,其中通过成对比较发现了两种亚型之间的差异表达基因(DEGs)。然后,通过这些DEGs的交集确定了丹参酮IIA的4个关键药物靶标。在外部数据集中评估并进一步验证靶基因的诊断性能。我们发现这4个关键的药物靶点可以作为有效的诊断生物标志物。然后对目标基因高表达组和低表达组的免疫评分停止估量,以辨别显著表达的免疫细胞。此外,高、低靶基因表达组在几个免疫细胞中的免疫浸润差异显著。研究结果表明,4个关键的药物靶标可能是预测肝硬化患者的简单而有用的诊断工具。我们进一步研究了AKR1C3和TPX2在体外的致癌作用。使用qRT-PCR和Westernblot检测肝癌细胞中的mRNA和蛋白表达。而敲除AKR1C3和TPX2显著抑制细胞增殖,移民和入侵。
    Liver cirrhosis is one of the most common cause of death in the world. The progress of liver cirrhosis involves health, liver cirrhosis and liver cancer, leading to great challenges in the diagnosis of the disease. Drug targets, which could be obtained conveniently, can help clinicians improve prognosis and treatment. Liver cirrhosis is associated with serum calcium levels. And studies reported Tanshinone IIA plays a therapeutic role in liver injury through activating calcium-dependent apoptosis. In this study, we explored the diagnostic key targets of Tanshinone IIA in liver cirrhosis through exploration of comprehensive dataset including health, liver cirrhosis and liver cancer patients. The unsupervised consensus clustering algorithm identified 3 novel subtypes in which differentially expressed genes (DEGs) between both subtypes were found by pairwise comparison. Then, 4 key drug targets of Tanshinone IIA were determined through the intersection of these DEGs. The diagnostic performance of target genes was assessed and further verified in the external dataset. We found that the 4 key drug targets could be used as effective diagnostic biomarkers. Then the immune scores in the high and low expression groups of target genes were estimated to identify significantly expressed immune cells. In addition, the immune infiltration of high and low target gene expression groups in several immune cells were significantly different. The findings suggest that 4 key drug targets may be a simple and useful diagnostic tool for predicting patients with cirrhosis. We further studied the carcinogenesis role of AKR1C3 and TPX2 in vitro. Both mRNA and protein expression in hepatoma carcinoma cells was detected using qRT-PCR and Western blot. And the knockdown of AKR1C3 and TPX2 significantly suppressed cell proliferation, migration and invasion.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号