背景:面对组学数据的多样性以及在几种方法产生的所有结果中选择一种结果的难度,共识策略有可能调和多种输入并产生稳健的结果。
结果:这里,我们介绍ClustOmics,我们在癌症亚型背景下使用的通用共识聚类工具。ClustOmics依赖于非关系图数据库,这允许同时整合多个组学数据和各种聚类方法的结果。这个新工具调解输入聚类,不管他们的起源,他们的号码,它们的大小或形状。ClustOmics实施了直观而灵活的策略,基于证据积累聚类的思想。ClustOmics计算输入簇中样品对的共同出现,并使用该分数作为相似性度量将数据重新组织成共识簇。
结论:我们将ClustOmics应用于来自十种不同癌症类型的真实TCGA癌症数据的多组学疾病分型。我们证明了ClustOmics对输入分区的异构质量是鲁棒的,将初步预测平滑和调和为高质量的共识集群,从计算和生物学的角度来看。与最先进的基于共识的集成工具的比较,COCA,进一步证实了这一说法。然而,ClustOmics的主要兴趣不是与其他工具竞争,而是在没有黄金标准指标可用于评估其重要性时,从其各种预测中获利。
背景:ClustOmics源代码,根据麻省理工学院的许可发布,和TCGA癌症数据获得的结果可在GitHub上获得:https://github.com/galadrielbriere/Clustomics。
BACKGROUND: Facing the diversity of omics data and the difficulty of selecting one result over all those produced by several methods,
consensus strategies have the potential to reconcile multiple inputs and to produce robust results.
RESULTS: Here, we introduce ClustOmics, a generic
consensus clustering tool that we use in the context of cancer subtyping. ClustOmics relies on a non-relational graph database, which allows for the simultaneous integration of both multiple omics data and results from various clustering methods. This new tool conciliates input clusterings, regardless of their origin, their number, their size or their shape. ClustOmics implements an intuitive and flexible strategy, based upon the idea of evidence accumulation clustering. ClustOmics computes co-occurrences of pairs of samples in input clusters and uses this score as a similarity measure to reorganize data into
consensus clusters.
CONCLUSIONS: We applied ClustOmics to multi-omics disease subtyping on real TCGA cancer data from ten different cancer types. We showed that ClustOmics is robust to heterogeneous qualities of input partitions, smoothing and reconciling preliminary predictions into high-quality
consensus clusters, both from a computational and a biological point of view. The comparison to a state-of-the-art
consensus-based integration tool, COCA, further corroborated this statement. However, the main interest of ClustOmics is not to compete with other tools, but rather to make profit from their various predictions when no gold-standard metric is available to assess their significance.
BACKGROUND: The ClustOmics source code, released under MIT license, and the results obtained on TCGA cancer data are available on GitHub: https://github.com/galadrielbriere/ClustOmics .