Data integration

数据集成
  • 文章类型: Journal Article
    个性化医疗的未来发展取决于不同来源的大量数据交换,以及大规模临床健康和样本数据的协调综合分析。计算建模方法在分析人类生物学特征的潜在分子过程和途径中起着关键作用。但它们也导致对驱动疾病的机制和因素的更深刻的理解;因此,它们允许以中心临床问题为指导的个性化治疗策略。然而,尽管计算建模方法在不同的利益相关者社区越来越受欢迎,在未来的临床常规实施中,仍有许多障碍需要克服。特别是来自多个来源和类型的异构数据的集成是具有挑战性的任务,需要明确的准则,这些准则还必须遵守高道德和法律标准。这里,我们详细讨论了个性化医疗最相关的计算模型,这些模型可被视为临床护理应用的最佳实践指南.我们定义了具体的挑战,并为研究设计提供了适用的指南和建议。数据采集,和操作以及模型验证和临床翻译等研究领域。
    The future development of personalized medicine depends on a vast exchange of data from different sources, as well as harmonized integrative analysis of large-scale clinical health and sample data. Computational-modelling approaches play a key role in the analysis of the underlying molecular processes and pathways that characterize human biology, but they also lead to a more profound understanding of the mechanisms and factors that drive diseases; hence, they allow personalized treatment strategies that are guided by central clinical questions. However, despite the growing popularity of computational-modelling approaches in different stakeholder communities, there are still many hurdles to overcome for their clinical routine implementation in the future. Especially the integration of heterogeneous data from multiple sources and types are challenging tasks that require clear guidelines that also have to comply with high ethical and legal standards. Here, we discuss the most relevant computational models for personalized medicine in detail that can be considered as best-practice guidelines for application in clinical care. We define specific challenges and provide applicable guidelines and recommendations for study design, data acquisition, and operation as well as for model validation and clinical translation and other research areas.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:面对组学数据的多样性以及在几种方法产生的所有结果中选择一种结果的难度,共识策略有可能调和多种输入并产生稳健的结果。
    结果:这里,我们介绍ClustOmics,我们在癌症亚型背景下使用的通用共识聚类工具。ClustOmics依赖于非关系图数据库,这允许同时整合多个组学数据和各种聚类方法的结果。这个新工具调解输入聚类,不管他们的起源,他们的号码,它们的大小或形状。ClustOmics实施了直观而灵活的策略,基于证据积累聚类的思想。ClustOmics计算输入簇中样品对的共同出现,并使用该分数作为相似性度量将数据重新组织成共识簇。
    结论:我们将ClustOmics应用于来自十种不同癌症类型的真实TCGA癌症数据的多组学疾病分型。我们证明了ClustOmics对输入分区的异构质量是鲁棒的,将初步预测平滑和调和为高质量的共识集群,从计算和生物学的角度来看。与最先进的基于共识的集成工具的比较,COCA,进一步证实了这一说法。然而,ClustOmics的主要兴趣不是与其他工具竞争,而是在没有黄金标准指标可用于评估其重要性时,从其各种预测中获利。
    背景:ClustOmics源代码,根据麻省理工学院的许可发布,和TCGA癌症数据获得的结果可在GitHub上获得:https://github.com/galadrielbriere/Clustomics。
    BACKGROUND: Facing the diversity of omics data and the difficulty of selecting one result over all those produced by several methods, consensus strategies have the potential to reconcile multiple inputs and to produce robust results.
    RESULTS: Here, we introduce ClustOmics, a generic consensus clustering tool that we use in the context of cancer subtyping. ClustOmics relies on a non-relational graph database, which allows for the simultaneous integration of both multiple omics data and results from various clustering methods. This new tool conciliates input clusterings, regardless of their origin, their number, their size or their shape. ClustOmics implements an intuitive and flexible strategy, based upon the idea of evidence accumulation clustering. ClustOmics computes co-occurrences of pairs of samples in input clusters and uses this score as a similarity measure to reorganize data into consensus clusters.
    CONCLUSIONS: We applied ClustOmics to multi-omics disease subtyping on real TCGA cancer data from ten different cancer types. We showed that ClustOmics is robust to heterogeneous qualities of input partitions, smoothing and reconciling preliminary predictions into high-quality consensus clusters, both from a computational and a biological point of view. The comparison to a state-of-the-art consensus-based integration tool, COCA, further corroborated this statement. However, the main interest of ClustOmics is not to compete with other tools, but rather to make profit from their various predictions when no gold-standard metric is available to assess their significance.
    BACKGROUND: The ClustOmics source code, released under MIT license, and the results obtained on TCGA cancer data are available on GitHub: https://github.com/galadrielbriere/ClustOmics .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Advancements in deep learning techniques carry the potential to make significant contributions to healthcare, particularly in fields that utilize medical imaging for diagnosis, prognosis, and treatment decisions. The current state-of-the-art deep learning models for radiology applications consider only pixel-value information without data informing clinical context. Yet in practice, pertinent and accurate non-imaging data based on the clinical history and laboratory data enable physicians to interpret imaging findings in the appropriate clinical context, leading to a higher diagnostic accuracy, informative clinical decision making, and improved patient outcomes. To achieve a similar goal using deep learning, medical imaging pixel-based models must also achieve the capability to process contextual data from electronic health records (EHR) in addition to pixel data. In this paper, we describe different data fusion techniques that can be applied to combine medical imaging with EHR, and systematically review medical data fusion literature published between 2012 and 2020. We conducted a systematic search on PubMed and Scopus for original research articles leveraging deep learning for fusion of multimodality data. In total, we screened 985 studies and extracted data from 17 papers. By means of this systematic review, we present current knowledge, summarize important results and provide implementation guidelines to serve as a reference for researchers interested in the application of multimodal fusion in medical imaging.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    尽管在健康和临床研究中产生数据的技术进步不断,通过高级分析产生的医疗福利新知识仍然落后于其全部潜力。造成这种障碍的原因是数据源固有的异质性以及缺乏广泛接受的标准。进一步的障碍与跨学科和边界使用个人/患者数据的法律和道德问题有关。因此,需要符合法律和伦理法规的广泛适用的标准,这些标准允许通过计算机模拟方法来解释不同的健康数据,以推进个性化医疗.为了应对这些标准化挑战,Horizon2020协调和支持行动EU-STANDS4PM启动了欧盟范围的映射过程,以评估数据集成和数据驱动的计算机建模方法制定标准的策略,个性化医疗的建议和指南。实现这一目标的第一步是在年度COMBINE会议上由EU-STANDS4PM研讨会发起的广泛的利益相关者协商过程(COMBINE2019研讨会报告同一期)。该论坛分析了数据和模型标准的现状,并反映了跨领域数据集成的可能性和挑战,以促进个性化医疗的计算机建模方法。
    Despite the ever-progressing technological advances in producing data in health and clinical research, the generation of new knowledge for medical benefits through advanced analytics still lags behind its full potential. Reasons for this obstacle are the inherent heterogeneity of data sources and the lack of broadly accepted standards. Further hurdles are associated with legal and ethical issues surrounding the use of personal/patient data across disciplines and borders. Consequently, there is a need for broadly applicable standards compliant with legal and ethical regulations that allow interpretation of heterogeneous health data through in silico methodologies to advance personalized medicine. To tackle these standardization challenges, the Horizon2020 Coordinating and Support Action EU-STANDS4PM initiated an EU-wide mapping process to evaluate strategies for data integration and data-driven in silico modelling approaches to develop standards, recommendations and guidelines for personalized medicine. A first step towards this goal is a broad stakeholder consultation process initiated by an EU-STANDS4PM workshop at the annual COMBINE meeting (COMBINE 2019 workshop report in same issue). This forum analysed the status quo of data and model standards and reflected on possibilities as well as challenges for cross-domain data integration to facilitate in silico modelling approaches for personalized medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在无监督学习和聚类中,来自不同来源和类型的数据集成是几个研究领域中讨论的难题。例如在组学分析中,在过去的十年中,已经开发了十几种聚类方法。当单一数据源发挥作用时,分层聚类(HC)非常流行,因为树结构是高度可解释的,并且可以说比仅仅是数据的分区更有用。然而,盲目地将HC应用于多个数据源会引发计算和解释问题。
    结果:我们建议合并树,一种聚合一组具有相同叶子的树以创建共识树的方法。在我们的共识树中,高度h处的集群包含高度h处所有树的同一集群中的个体。该方法是精确的,并被证明是[公式:见文本],n是个体,q是要聚合的树的数量。我们的实施对模拟非常有效,允许我们一次处理许多大树。我们还依靠mergeTrees对两个真实的组学数据集进行聚类分析,引入光谱变体作为有效和稳健的副产品。
    结论:我们的树聚合方法可以与层次聚类结合使用,以进行有效的聚类分析。发现这种方法对于某些数据集中缺乏聚类信息以及真实聚类中增加的可变性是稳健的。该方法在R/C++中实现,可作为名为mergeTrees的R包使用,这使得它很容易集成在现有的或新的管道在几个研究领域。
    BACKGROUND: In unsupervised learning and clustering, data integration from different sources and types is a difficult question discussed in several research areas. For instance in omics analysis, dozen of clustering methods have been developed in the past decade. When a single source of data is at play, hierarchical clustering (HC) is extremely popular, as a tree structure is highly interpretable and arguably more informative than just a partition of the data. However, applying blindly HC to multiple sources of data raises computational and interpretation issues.
    RESULTS: We propose mergeTrees, a method that aggregates a set of trees with the same leaves to create a consensus tree. In our consensus tree, a cluster at height h contains the individuals that are in the same cluster for all the trees at height h. The method is exact and proven to be [Formula: see text], n being the individuals and q being the number of trees to aggregate. Our implementation is extremely effective on simulations, allowing us to process many large trees at a time. We also rely on mergeTrees to perform the cluster analysis of two real -omics data sets, introducing a spectral variant as an efficient and robust by-product.
    CONCLUSIONS: Our tree aggregation method can be used in conjunction with hierarchical clustering to perform efficient cluster analysis. This approach was found to be robust to the absence of clustering information in some of the data sets as well as an increased variability within true clusters. The method is implemented in R/C++ and available as an R package named mergeTrees, which makes it easy to integrate in existing or new pipelines in several research areas.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Next-generation sequencing has allowed identification of millions of somatic mutations in human cancer cells. A key challenge in interpreting cancer genomes is to distinguish drivers of cancer development among available genetic mutations. To address this issue, we present the first web-based application, consensus cancer driver gene caller (C3), to identify the consensus driver genes using six different complementary strategies, i.e., frequency-based, machine learning-based, functional bias-based, clustering-based, statistics model-based, and network-based strategies. This application allows users to specify customized operations when calling driver genes, and provides solid statistical evaluations and interpretable visualizations on the integration results. C3 is implemented in Python and is freely available for public use at http://drivergene.rwebox.com/c3.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Mathematical modeling and numerical simulation are crucial to support design decisions in synthetic biology. Accurate estimation of parameter values is key, as direct experimental measurements are difficult and time-consuming. Insufficient data, incompatible measurements, and specialized models that lack universal parameters make this task challenging. Here, we have created a database (PAMDB) that integrates data from 135 publications that contain 118 circuits and 165 genetic parts of the bacterium Escherichia coli. We used a succinct, universal model formulation to describe the part behavior in each circuit. We introduce a constrained consensus inference method that was used to infer the value of the model parameters and evaluated its performance through cross-validation in a benchmark of 23 circuits. We discuss these results and summarize the challenges in data integration and parameter inference. This work provides a resource and a methodology that can be used as a point of reference for synthetic circuit modeling.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    人们广泛接受并承认,数据协调至关重要:在没有数据的情况下,高质量现有数据的主要部分的共同分析容易效率低下或出错。然而,尽管其广泛的实践,没有正式/系统的指南来确保高质量的回顾性数据协调.
    为了更好地了解现实世界的协调实践并促进正式准则的制定,在2006年至2015年期间开展了三项相互关联的举措.他们包括一项电话调查,包括34项主要的国际研究计划,一系列与专家的研讨会,以及应用拟议准则的案例研究。
    许多项目使用回顾性协调来支持其研究活动,但即使使用了适当的方法,术语,程序,采用的技术和方法差异很大。本文概述的通用指南描述了所需的要点,并描述了相互依赖的逐步协调方法:0)定义研究问题,目标和协议;1)收集预先存在的知识和选择研究;2)定义目标变量和评估协调潜力;3)过程数据;4)估计生成的协调数据集的质量;和5)传播和保存最终的协调产品。
    本手稿提供了指导方针,旨在鼓励严格和有效的协调方法,这些方法是全面和透明地记录的,并且可以直接解释和实施。这可以被视为实施指导原则的关键一步,这些指导原则被公认为在确保系统评价的基础基础和临床试验的荟萃分析方面至关重要。
    It is widely accepted and acknowledged that data harmonization is crucial: in its absence, the co-analysis of major tranches of high quality extant data is liable to inefficiency or error. However, despite its widespread practice, no formalized/systematic guidelines exist to ensure high quality retrospective data harmonization.
    To better understand real-world harmonization practices and facilitate development of formal guidelines, three interrelated initiatives were undertaken between 2006 and 2015. They included a phone survey with 34 major international research initiatives, a series of workshops with experts, and case studies applying the proposed guidelines.
    A wide range of projects use retrospective harmonization to support their research activities but even when appropriate approaches are used, the terminologies, procedures, technologies and methods adopted vary markedly. The generic guidelines outlined in this article delineate the essentials required and describe an interdependent step-by-step approach to harmonization: 0) define the research question, objectives and protocol; 1) assemble pre-existing knowledge and select studies; 2) define targeted variables and evaluate harmonization potential; 3) process data; 4) estimate quality of the harmonized dataset(s) generated; and 5) disseminate and preserve final harmonization products.
    This manuscript provides guidelines aiming to encourage rigorous and effective approaches to harmonization which are comprehensively and transparently documented and straightforward to interpret and implement. This can be seen as a key step towards implementing guiding principles analogous to those that are well recognised as being essential in securing the foundational underpinning of systematic reviews and the meta-analysis of clinical trials.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号