Data integration

数据集成
  • 文章类型: Journal Article
    准确的诊断是提供及时明确的治疗和疾病管理的关键。用于感染性病原体的分子诊断的公认的生物学方法是聚合酶链反应(PCR)。最近,深度学习方法在准确识别疾病相关基因进行诊断方面发挥着至关重要的作用,预后,和治疗。该模型减少了湿实验室实验程序使用的时间和成本。因此,已经开发了复杂的计算方法来促进癌症的检测,全球死亡的主要原因,和其他复杂的疾病。在这次审查中,我们系统地评估了基于深度学习技术的多组学数据分析的最新趋势及其在疾病预测中的应用。我们重点介绍了该领域当前的挑战,并讨论了深度学习方法的进步及其应用优化对于克服这些挑战至关重要。最终,这篇综述促进了用于数据集成的新型深度学习方法的发展,这对于疾病的检测和治疗至关重要。
    Accurate diagnosis is the key to providing prompt and explicit treatment and disease management. The recognized biological method for the molecular diagnosis of infectious pathogens is polymerase chain reaction (PCR). Recently, deep learning approaches are playing a vital role in accurately identifying disease-related genes for diagnosis, prognosis, and treatment. The models reduce the time and cost used by wet-lab experimental procedures. Consequently, sophisticated computational approaches have been developed to facilitate the detection of cancer, a leading cause of death globally, and other complex diseases. In this review, we systematically evaluate the recent trends in multi-omics data analysis based on deep learning techniques and their application in disease prediction. We highlight the current challenges in the field and discuss how advances in deep learning methods and their optimization for application is vital in overcoming them. Ultimately, this review promotes the development of novel deep-learning methodologies for data integration, which is essential for disease detection and treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    众所周知,人工智能(AI)特别是机器学习(ML),没有良好的数据准备是无效的,最近一波以数据为中心的人工智能也指出了这一点。数据准备是收集的过程,在处理和分析之前转换和清理原始数据。由于现在的数据通常驻留在分布式和异构数据源中,数据准备的第一个活动需要从合适的数据源和数据服务收集数据,通常是分布式和异构的。因此,提供商必须以某种方式描述其数据服务,以使其符合FAIR指导原则,即,使它们自动可辨认,可访问,互操作,可重用(FAIR)。引入数据抽象的概念正是为了满足这一需求。抽象是一种逆向工程任务,可自动提供提供者可用的数据服务的语义表征。本文的目标是回顾迄今为止在数据抽象方面获得的结果,通过提出其定义的正式框架,报告有关抽象的主要理论问题的可判定性和复杂性,并讨论了未来研究的开放性问题和有趣的方向。
    It is well-known that Artificial Intelligence (AI), and in particular Machine Learning (ML), is not effective without good data preparation, as also pointed out by the recent wave of data-centric AI. Data preparation is the process of gathering, transforming and cleaning raw data prior to processing and analysis. Since nowadays data often reside in distributed and heterogeneous data sources, the first activity of data preparation requires collecting data from suitable data sources and data services, often distributed and heterogeneous. It is thus essential that providers describe their data services in a way to make them compliant with the FAIR guiding principles, i.e., make them automatically Findable, Accessible, Interoperable, and Reusable (FAIR). The notion of data abstraction has been introduced exactly to meet this need. Abstraction is a kind of reverse engineering task that automatically provides a semantic characterization of a data service made available by a provider. The goal of this paper is to review the results obtained so far in data abstraction, by presenting the formal framework for its definition, reporting about the decidability and complexity of the main theoretical problems concerning abstraction, and discuss open issues and interesting directions for future research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着公共存储库中单细胞RNA测序(scRNA-seq)数据集数量的增加,多个scRNA-seq数据集的整合分析已变得司空见惯。由于细胞分离和处理协议的差异,不同数据集之间的批量效应是不可避免的,图书馆准备技术,和测序平台。为了消除这些批量效应以有效整合多个scRNA-seq数据集,已经根据不同的概念和方法开发了许多方法。这些方法已被证明对检查细胞特征是否有用,如细胞亚群和标记基因,从某个数据集中识别,始终存在,或者它们是否依赖于条件的变化,例如在特定疾病相关疾病中细胞亚群的增加,在相似或不同条件下生成的不同数据集中始终观察到。在这次审查中,我们总结了整合方法的概念和方法及其优缺点,如以前的文献报道。
    With the increased number of single-cell RNA sequencing (scRNA-seq) datasets in public repositories, integrative analysis of multiple scRNA-seq datasets has become commonplace. Batch effects among different datasets are inevitable because of differences in cell isolation and handling protocols, library preparation technology, and sequencing platforms. To remove these batch effects for effective integration of multiple scRNA-seq datasets, a number of methodologies have been developed based on diverse concepts and approaches. These methods have proven useful for examining whether cellular features, such as cell subpopulations and marker genes, identified from a certain dataset, are consistently present, or whether their condition-dependent variations, such as increases in cell subpopulations in particular disease-related conditions, are consistently observed in different datasets generated under similar or distinct conditions. In this review, we summarize the concepts and approaches of the integration methods and their pros and cons as has been reported in previous literature.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:旅程地图是可视化工具,可以通过兴趣或功能促进利益相关者群体的图解表示,以进行比较的视觉分析。因此,旅程地图可以说明使用产品或服务的组织和消费者之间的交集和关系。我们建议在旅程图和学习卫生系统(LHS)的概念之间可能存在一些协同作用。LHS的总体目标是使用医疗保健数据来告知临床实践并改善服务交付流程和患者结果。
    目的:这篇综述的目的是评估文献并建立旅程映射技术与LHS之间的关系。具体来说,在这项研究中,我们探索了文献的现状,以回答以下研究问题:(1)文献中的旅程制图技术与LHS之间是否存在关系?(2)是否有将旅程制图活动的数据集成到LHS中的方法?(3)如何将从旅程地图活动中收集的数据用于通知LHS?
    方法通过查询以下电子数据库进行了Coscope,系统审查:IEEEXplore,PubMed,WebofScience,学术搜索完成(EBSCOhost),APAPsycInfo(EBSCOhost),CINAHL(EBSCOhost),和MEDLINE(EBSCOhost)。两名研究人员应用了纳入标准,并在第一个屏幕中通过标题和摘要评估了所有文章,使用Covidence。在此之后,对收录的文章进行了全文审查,提取相关数据,列表,并按主题进行评估。
    结果:最初的搜索产生了694项研究。其中,移除179个重复。在此之后,在第一个筛选阶段评估了515篇文章,412被排除在外,因为他们不符合纳入标准。接下来,全文阅读了103篇文章,95人被排除在外,最终得到8篇满足纳入标准的文章。文章样本可以分为2个总体主题:(1)需要发展医疗保健服务提供模式,(2)在LHS中使用患者旅程数据的潜在价值。
    结论:本范围审查表明,在将旅程制图活动的数据整合到LHS方面存在知识差距。我们的发现强调了使用患者经验数据丰富LHS并提供整体护理的重要性。为了满足这个差距,作者打算继续这项研究,以建立旅程图和LHS概念之间的关系.此范围审查将作为调查系列的第一阶段。第二阶段将需要创建一个整体框架,以指导和简化从旅程映射活动到LHS的数据集成。最后,第3阶段将提供概念证明,以演示如何将患者旅程映射活动集成到LHS中。
    BACKGROUND: Journey maps are visualization tools that can facilitate the diagrammatical representation of stakeholder groups by interest or function for comparative visual analysis. Therefore, journey maps can illustrate intersections and relationships between organizations and consumers using products or services. We propose that some synergies may exist between journey maps and the concept of a learning health system (LHS). The overarching goal of an LHS is to use health care data to inform clinical practice and improve service delivery processes and patient outcomes.
    OBJECTIVE: The purpose of this review was to assess the literature and establish a relationship between journey mapping techniques and LHSs. Specifically, in this study, we explored the current state of the literature to answer the following research questions: (1) Is there a relationship between journey mapping techniques and an LHS in the literature? (2) Is there a way to integrate the data from journey mapping activities into an LHS? (3) How can the data gleaned from journey map activities be used to inform an LHS?
    METHODS: A scoping review was conducted by querying the following electronic databases: Cochrane Database of Systematic Reviews (Ovid), IEEE Xplore, PubMed, Web of Science, Academic Search Complete (EBSCOhost), APA PsycInfo (EBSCOhost), CINAHL (EBSCOhost), and MEDLINE (EBSCOhost). Two researchers applied the inclusion criteria and assessed all articles by title and abstract in the first screen, using Covidence. Following this, a full-text review of included articles was done, with relevant data extracted, tabulated, and assessed thematically.
    RESULTS: The initial search yielded 694 studies. Of those, 179 duplicates were removed. Following this, 515 articles were assessed during the first screening phase, and 412 were excluded, as they did not meet the inclusion criteria. Next, 103 articles were read in full, and 95 were excluded, resulting in a final sample of 8 articles that satisfied the inclusion criteria. The article sample can be subsumed into 2 overarching themes: (1) the need to evolve service delivery models in health care, and (2) the potential value of using patient journey data in an LHS.
    CONCLUSIONS: This scoping review demonstrated the gap in knowledge regarding integrating the data from journey mapping activities into an LHS. Our findings highlighted the importance of using the data from patient experiences to enrich an LHS and provide holistic care. To satisfy this gap, the authors intend to continue this investigation to establish the relationship between journey mapping and the concept of LHSs. This scoping review will serve as phase 1 of an investigative series. Phase 2 will entail the creation of a holistic framework to guide and streamline data integration from journey mapping activities into an LHS. Lastly, phase 3 will provide a proof of concept to demonstrate how patient journey mapping activities could be integrated into an LHS.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    肿瘤学药物开发的临床试验长期以来依赖于评估肿瘤负荷变化的替代结果生物标志物,以加速药物注册(即,实体肿瘤1.1版(RECISTv1.1)标准中的反应评估标准)。药物诱导的肿瘤大小减小代表了药物活性的不完美替代标记,但放射学确定的客观反应率是2期试验的广泛使用的终点。随着针对免疫系统和DNA损伤修复途径等复杂生物系统的治疗的增加,整合反应和结果生物标志物可能会增加更多的预测价值。我们对四种代表性肿瘤类型(乳腺癌,直肠癌,肺癌和胶质母细胞瘤),以评估体积和放射组学指标作为临床试验终点的准备情况。我们确定了三个关键领域——细分,验证和数据共享策略-需要协调一致的努力,以实现基于体积和影像组学的临床试验终点的进展,以实现更广泛的临床实施。
    Clinical trials for oncology drug development have long relied on surrogate outcome biomarkers that assess changes in tumor burden to accelerate drug registration (i.e., Response Evaluation Criteria in Solid Tumors version 1.1 (RECIST v1.1) criteria). Drug-induced reduction in tumor size represents an imperfect surrogate marker for drug activity and yet a radiologically determined objective response rate is a widely used endpoint for Phase 2 trials. With the addition of therapies targeting complex biological systems such as immune system and DNA damage repair pathways, incorporation of integrative response and outcome biomarkers may add more predictive value. We performed a review of the relevant literature in four representative tumor types (breast cancer, rectal cancer, lung cancer and glioblastoma) to assess the preparedness of volumetric and radiomics metrics as clinical trial endpoints. We identified three key areas-segmentation, validation and data sharing strategies-where concerted efforts are required to enable progress of volumetric- and radiomics-based clinical trial endpoints for wider clinical implementation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    生物医学数据正变得越来越多模式,从而捕获生物过程之间潜在的复杂关系。基于深度学习(DL)的数据融合策略是对这些非线性关系进行建模的流行方法。因此,我们回顾了这些方法的最新技术,并提出了详细的分类法,以促进生物医学应用的融合策略的更明智的选择。以及对新方法的研究。通过这样做,我们发现深度融合策略通常优于单峰和浅层方法。此外,提出的融合策略子类别显示出不同的优点和缺点。对当前方法的回顾表明,特别是对于中间融合策略,联合表示学习是首选方法,因为它有效地模拟了不同层次生物组织的复杂相互作用。最后,我们注意到逐渐融合,基于先前的生物学知识或搜索策略,是一条很有前途的未来研究道路。同样,利用迁移学习可以克服多模态数据集的样本量限制。随着这些数据集变得越来越可用,多模式DL方法提供了训练整体模型的机会,这些模型可以学习健康和疾病背后的复杂调节动态。
    Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    元数据是为了以详细和明确的方式描述相应的数据而创建的,并用于不同研究领域的各种应用,例如,数据识别和分类。然而,元数据的明确定义对于进一步使用至关重要。不幸的是,处理和管理元数据的丰富经验表明,术语“元数据”及其使用并不总是明确的。
    本研究旨在了解元数据的定义以及元数据重用带来的挑战。
    本研究遵循系统评价报告的PRISMA(系统评价和荟萃分析的首选报告项目)指南进行系统文献检索。确定了五个研究问题,以简化审查过程,寻址元数据特征,元数据标准,用例,和遇到的问题。在进行这项审查之前,进行了统一进程,以实现对所用术语的普遍理解。
    协调过程为元数据处理提供了一套清晰的定义,重点是数据集成。以下文献综述是由10位具有不同背景并使用统一定义的审稿人进行的。这项研究包括过去十年的81篇同行评审论文,这些论文应用了各种过滤步骤来确定最相关的论文。这5个研究问题可以回答,从而对标准进行了广泛的概述,用例,问题,元数据在不同研究领域的应用及相应的解决方案。
    元数据可以成为识别,描述,和处理信息,但是它的有意义的创造是昂贵且具有挑战性的。这个审查过程发现了许多标准,用例,问题,以及处理元数据的解决方案。所提出的协调定义和新模式有可能通过创建对元数据及其上下文的共同理解来改进元数据的分类和生成。
    Metadata are created to describe the corresponding data in a detailed and unambiguous way and is used for various applications in different research areas, for example, data identification and classification. However, a clear definition of metadata is crucial for further use. Unfortunately, extensive experience with the processing and management of metadata has shown that the term \"metadata\" and its use is not always unambiguous.
    This study aimed to understand the definition of metadata and the challenges resulting from metadata reuse.
    A systematic literature search was performed in this study following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for reporting on systematic reviews. Five research questions were identified to streamline the review process, addressing metadata characteristics, metadata standards, use cases, and problems encountered. This review was preceded by a harmonization process to achieve a general understanding of the terms used.
    The harmonization process resulted in a clear set of definitions for metadata processing focusing on data integration. The following literature review was conducted by 10 reviewers with different backgrounds and using the harmonized definitions. This study included 81 peer-reviewed papers from the last decade after applying various filtering steps to identify the most relevant papers. The 5 research questions could be answered, resulting in a broad overview of the standards, use cases, problems, and corresponding solutions for the application of metadata in different research areas.
    Metadata can be a powerful tool for identifying, describing, and processing information, but its meaningful creation is costly and challenging. This review process uncovered many standards, use cases, problems, and solutions for dealing with metadata. The presented harmonized definitions and the new schema have the potential to improve the classification and generation of metadata by creating a shared understanding of metadata and its context.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    The emergence of single-cell sequencing technology enables people to observe cells with unprecedented precision. However, it is difficult to capture the information on all cells and genes in one single-cell RNA sequencing (scRNA-seq) experiment. Single-cell data of a single modality cannot explain cell state and system changes in detail. The integrative analysis of single-cell data aims to address these two types of problems. Integrating multiple scRNA-seq data can collect complete cell types and provide a powerful boost for the construction of cell atlases. Integrating single-cell multimodal data can be used to study the causal relationship and gene regulation mechanism across modalities. The development and application of data integration methods helps fully explore the richness and relevance of single-cell data and discover meaningful biological changes. Based on this, this article reviews the basic principles, methods and applications of multiple scRNA-seq data integration and single-cell multimodal data integration. Moreover, the advantages and disadvantages of existing methods are discussed. Finally, the future development is prospected.
    单细胞测序技术的出现使得人们能够以前所未有的精度观测细胞。然而,单次单细胞转录组测序(scRNA-seq)实验难以捕获所有细胞和基因的信息,单个模态的单细胞数据无法详细阐释细胞状态和系统变化,单细胞数据的整合分析旨在解决这两类问题。整合不同来源的scRNA-seq数据,可以收集完整的细胞类型,为构建细胞图谱提供强大助力;整合多个模态的单细胞数据,可以研究模态间因果关系和基因调控机制。数据整合方法的开发与应用帮助充分挖掘单细胞数据的丰富性和相关性,发现有意义的生物学变化。基于此,本文综述了多源scRNA-seq数据整合和单细胞多模态数据整合的基本原理、方法和应用,并讨论了现有方法的优势和不足,最后对未来的发展前景予以展望。.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    高通量技术的技术进步已导致复杂生物数据集的巨大增长,从而提供了有关各种生物分子相互作用的证据。为了应对这种数据泛滥,计算方法,Web服务,和数据库已经实现,以处理数据集成等问题,可视化,探索,组织,可扩展性,和复杂性。然而,随着此类集合数量的增加,对于最终用户来说,知道每个存储库的范围和重点以及它们之间的信息冗余程度变得越来越困难。几个存储库具有更一般的范围,而其他人则专注于专业方面,例如特定的有机体或生物系统。不幸的是,这些数据库中有许多是独立的,或者记录和维护不善。为了更清晰的视角,在这篇文章中,我们提供了一个全面的分类,对不同生物实体相互作用类型的此类存储库进行比较和评估。我们根据内容讨论大多数公开可用的服务,信息来源,数据表示方法,用户友好性,范围和互连性,我们评论他们的优点和缺点。我们的目标是让这篇评论达到从生物医学初学者到专家的广泛读者群,并作为网络生物学领域的参考文章。
    Technological advances in high-throughput techniques have resulted in tremendous growth of complex biological datasets providing evidence regarding various biomolecular interactions. To cope with this data flood, computational approaches, web services, and databases have been implemented to deal with issues such as data integration, visualization, exploration, organization, scalability, and complexity. Nevertheless, as the number of such sets increases, it is becoming more and more difficult for an end user to know what the scope and focus of each repository is and how redundant the information between them is. Several repositories have a more general scope, while others focus on specialized aspects, such as specific organisms or biological systems. Unfortunately, many of these databases are self-contained or poorly documented and maintained. For a clearer view, in this article we provide a comprehensive categorization, comparison and evaluation of such repositories for different bioentity interaction types. We discuss most of the publicly available services based on their content, sources of information, data representation methods, user-friendliness, scope and interconnectivity, and we comment on their strengths and weaknesses. We aim for this review to reach a broad readership varying from biomedical beginners to experts and serve as a reference article in the field of Network Biology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    由于护理的改善,癌症已经成为一种慢性疾病。但是由于治疗的毒性,支持癌症患者生活质量(QoL)的重要性增加。监控和管理QoL依赖于患者在其家庭环境中收集的数据,它的整合,及其分析,这支持癌症管理建议的个性化。我们回顾了最先进的计算机化系统,这些系统采用人工智能和数据科学方法来监测健康状况,并为在家管理的癌症患者提供支持。
    我们的主要目标是分析文献,以确定癌症患者和临床医生的新型决策支持系统需要解决的开放研究挑战。指出潜在的解决方案,并提供一份既定的最佳做法清单。
    我们设计了一个回顾研究,符合系统审查和荟萃分析(PRISMA)指南的首选报告项目,分析从PubMed检索到的与通过传感器和自我报告在家庭环境中监测癌症患者相关的研究:收集了什么数据,收集数据的技术是什么,在语义上整合它,从中推断患者的状态,并提供指导/行为改变干预措施。
    从819篇独特文章的初始语料库开始,全文分析共考虑了180篇论文,最终纳入了109篇。我们的研究结果被组织和呈现在四个主要的子主题中,包括数据收集,数据集成,预测建模和患者指导。
    开发现代癌症决策支持系统需要利用最佳实践,例如使用经过验证的电子问卷进行生活质量评估,采用适当的信息建模标准,辅以术语/本体,遵守公平数据原则,外部验证,亚组患者分层,以获得更好的预测模型,并采用正式的行为改变理论。开放的研究挑战包括支持情感和社会层面的福祉,包括预测建模中的PRO,并为癌症患者的特定人群提供更好的行为干预措施。
    Thanks to improvement of care, cancer has become a chronic condition. But due to the toxicity of treatment, the importance of supporting the quality of life (QoL) of cancer patients increases. Monitoring and managing QoL relies on data collected by the patient in his/her home environment, its integration, and its analysis, which supports personalization of cancer management recommendations. We review the state-of-the-art of computerized systems that employ AI and Data Science methods to monitor the health status and provide support to cancer patients managed at home.
    Our main objective is to analyze the literature to identify open research challenges that a novel decision support system for cancer patients and clinicians will need to address, point to potential solutions, and provide a list of established best-practices to adopt.
    We designed a review study, in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, analyzing studies retrieved from PubMed related to monitoring cancer patients in their home environments via sensors and self-reporting: what data is collected, what are the techniques used to collect data, semantically integrate it, infer the patient\'s state from it and deliver coaching/behavior change interventions.
    Starting from an initial corpus of 819 unique articles, a total of 180 papers were considered in the full-text analysis and 109 were finally included in the review. Our findings are organized and presented in four main sub-topics consisting of data collection, data integration, predictive modeling and patient coaching.
    Development of modern decision support systems for cancer needs to utilize best practices like the use of validated electronic questionnaires for quality-of-life assessment, adoption of appropriate information modeling standards supplemented by terminologies/ontologies, adherence to FAIR data principles, external validation, stratification of patients in subgroups for better predictive modeling, and adoption of formal behavior change theories. Open research challenges include supporting emotional and social dimensions of well-being, including PROs in predictive modeling, and providing better customization of behavioral interventions for the specific population of cancer patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号