Data integration

数据集成
  • 文章类型: Editorial
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    构建基因调控网络是研究基因调控的一种广泛采用的方法。在生物学和医学中提供多样化的应用。大量的研究集中在使用时间序列数据或单细胞RNA测序数据来推断基因调控网络。然而,这样的基因表达数据缺乏细胞或时间信息。幸运的是,延时共聚焦激光显微镜的出现使生物学家能够获得秀丽隐杆线虫的树形基因表达数据,实现细胞和时间分辨率。尽管这样的树形数据提供了丰富的知识,它们像非配对时间序列一样构成挑战,奠定了下游分析的不准确性。为了解决这个问题,提出了一个全面的数据集成框架和一种新的基于布尔时滞网络的贝叶斯方法。应用预筛选过程和马尔可夫链蒙特卡罗算法获得参数估计。仿真研究表明,我们的方法优于现有的布尔网络推理算法。利用拟议的方法,基于秀丽隐杆线虫的真实树形数据集,重建了五个子树的基因调控网络,在以前的遗传研究中证实的一些基因调控关系被恢复。此外,检测到不同细胞谱系子树中调节关系的异质性。此外,正在探索在人类疾病中具有重要意义的潜在基因调控关系。所有源代码均可在GitHub存储库https://github.com/edawu11/BBTD获取。git.
    Constructing gene regulatory networks is a widely adopted approach for investigating gene regulation, offering diverse applications in biology and medicine. A great deal of research focuses on using time series data or single-cell RNA-sequencing data to infer gene regulatory networks. However, such gene expression data lack either cellular or temporal information. Fortunately, the advent of time-lapse confocal laser microscopy enables biologists to obtain tree-shaped gene expression data of Caenorhabditis elegans, achieving both cellular and temporal resolution. Although such tree-shaped data provide abundant knowledge, they pose challenges like non-pairwise time series, laying the inaccuracy of downstream analysis. To address this issue, a comprehensive framework for data integration and a novel Bayesian approach based on Boolean network with time delay are proposed. The pre-screening process and Markov Chain Monte Carlo algorithm are applied to obtain the parameter estimates. Simulation studies show that our method outperforms existing Boolean network inference algorithms. Leveraging the proposed approach, gene regulatory networks for five subtrees are reconstructed based on the real tree-shaped datatsets of Caenorhabditis elegans, where some gene regulatory relationships confirmed in previous genetic studies are recovered. Also, heterogeneity of regulatory relationships in different cell lineage subtrees is detected. Furthermore, the exploration of potential gene regulatory relationships that bear importance in human diseases is undertaken. All source code is available at the GitHub repository https://github.com/edawu11/BBTD.git.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    解密转录因子(TFs)之间的复杂关系,增强器,和基因通过增强子驱动的基因调控网络(eGRN)的推断对于理解复杂生物系统中的基因调控程序至关重要。这项研究引入了STREAM,一种利用斯坦纳森林问题模型的新方法,一个混合的双闪烁管道,和亚模块化优化,从联合分析的单细胞转录组和染色质可达性数据推断eGRN。与现有方法相比,STREAM在TF恢复方面表现出增强的性能,TF-增强子连锁预测,和增强子-基因关系发现。将STREAM应用于阿尔茨海默病数据集和弥漫性小淋巴细胞淋巴瘤数据集揭示了其识别与假时间相关的TF-增强子-基因关系的能力,以及关键的TF增强子基因关系和TF合作潜在的肿瘤细胞。
    Deciphering the intricate relationships between transcription factors (TFs), enhancers, and genes through the inference of enhancer-driven gene regulatory networks (eGRNs) is crucial in understanding gene regulatory programs in a complex biological system. This study introduces STREAM, a novel method that leverages a Steiner forest problem model, a hybrid biclustering pipeline, and submodular optimization to infer eGRNs from jointly profiled single-cell transcriptome and chromatin accessibility data. Compared to existing methods, STREAM demonstrates enhanced performance in terms of TF recovery, TF-enhancer linkage prediction, and enhancer-gene relation discovery. Application of STREAM to an Alzheimer\'s disease dataset and a diffuse small lymphocytic lymphoma dataset reveals its ability to identify TF-enhancer-gene relations associated with pseudotime, as well as key TF-enhancer-gene relations and TF cooperation underlying tumor cells.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:空间分辨转录组学数据集的综合分析使人们对复杂的生物系统有了更深入的了解。然而,整合多个组织切片对批量效应去除提出了挑战,特别是当这些部分通过各种技术测量或在不同时间收集时。
    结果:我们建议空间对齐,一个无监督的对比学习模型,采用所有测量基因的表达和细胞的空间位置,整合多个组织切片。它不仅可以在低维嵌入中,而且可以在重建的完整表达式空间中对多个数据集进行联合下游分析。
    结论:在基准分析中,spatiacAlign在学习组织切片的联合和判别表示方面优于最先进的方法,每个潜在的特征是复杂的批次效应或不同的生物学特征。此外,我们证明了spatialAlign对时间序列大脑切片的综合分析的好处,包括空间聚类,差异表达分析,特别是需要校正基因表达矩阵的轨迹推断。
    Integrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times.
    We propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space.
    In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    表观遗传学是指影响与染色质相关的核过程的基因表达和功能的可遗传变化。都不改变DNA序列.这些表观遗传模式,作为遗传性状,是复杂调节基因表达和遗传的重要生物学机制。近年来,化学标记和单细胞分辨率作图策略的应用极大地促进了核酸中的大规模表观遗传修饰。值得注意的是,表观遗传修饰可以诱导可遗传的表型变化,调节细胞分化,影响细胞特异性基因表达,父母印记基因,激活X染色体,稳定基因组结构。鉴于它们的可逆性和对环境因素的敏感性,表观遗传修饰在疾病诊断中得到了重视,显着影响临床医学研究。最近的研究揭示了表观遗传修饰与代谢性心血管疾病发病机制之间的紧密联系。包括先天性心脏病,心力衰竭,心肌病,高血压,和动脉粥样硬化。在这次审查中,我们概述了心血管疾病背景下的表观遗传学研究进展,包括它们的发病机理,预防,诊断,和治疗。此外,我们揭示了核酸表观遗传修饰在临床医学和生物医学应用中的潜在前景。
    Epigenetics refers to heritable changes in gene expression and function that impact nuclear processes associated with chromatin, all without altering DNA sequences. These epigenetic patterns, being heritable traits, are vital biological mechanisms that intricately regulate gene expression and heredity. The application of chemical labeling and single-cell resolution mapping strategies has significantly facilitated large-scale epigenetic modifications in nucleic acids over recent years. Notably, epigenetic modifications can induce heritable phenotypic changes, regulate cell differentiation, influence cell-specific gene expression, parentally imprint genes, activate the X chromosome, and stabilize genome structure. Given their reversibility and susceptibility to environmental factors, epigenetic modifications have gained prominence in disease diagnosis, significantly impacting clinical medicine research. Recent studies have uncovered strong links between epigenetic modifications and the pathogenesis of metabolic cardiovascular diseases, including congenital heart disease, heart failure, cardiomyopathy, hypertension, and atherosclerosis. In this review, we provide an overview of the progress in epigenetic research within the context of cardiovascular diseases, encompassing their pathogenesis, prevention, diagnosis, and treatment. Furthermore, we shed light on the potential prospects of nucleic acid epigenetic modifications as a promising avenue in clinical medicine and biomedical applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    植物形态发生依赖于在适当时间和位置的精确基因表达程序,这是由转录因子(TF)以细胞类型特异性方式在复杂的调节网络中协调的。在这里,我们介绍了拟南芥幼苗的全面单细胞转录组学图谱。此图集是对63个先前发布的scRNA-seq数据集进行精心整合的结果,解决批次效应和保护生物变异。这种整合跨越了广泛的组织,包括地下和地上部分。利用严格的细胞类型注释方法,我们确定了47种不同的细胞类型或状态,在很大程度上扩展了我们目前对植物细胞组成的看法。我们系统地构建了细胞类型特异性基因调控网络,并发现了以协调方式控制细胞类型特异性基因表达的关键调控因子。一起来看,我们的研究不仅提供了广泛的植物细胞图谱探索,作为一种宝贵的资源,而且还提供了对不同细胞类型的基因调控程序的分子见解。
    Plant morphogenesis relies on precise gene expression programs at the proper time and position which is orchestrated by transcription factors (TFs) in intricate regulatory networks in a cell-type specific manner. Here we introduced a comprehensive single-cell transcriptomic atlas of Arabidopsis seedlings. This atlas is the result of meticulous integration of 63 previously published scRNA-seq datasets, addressing batch effects and conserving biological variance. This integration spans a broad spectrum of tissues, including both below- and above-ground parts. Utilizing a rigorous approach for cell type annotation, we identified 47 distinct cell types or states, largely expanding our current view of plant cell compositions. We systematically constructed cell-type specific gene regulatory networks and uncovered key regulators that act in a coordinated manner to control cell-type specific gene expression. Taken together, our study not only offers extensive plant cell atlas exploration that serves as a valuable resource, but also provides molecular insights into gene-regulatory programs that varies from different cell types.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    来自相同组织或细胞系的许多单细胞转录组数据集由不同的实验室或单细胞RNA测序(scRNA-seq)方案产生。对这些数据集进行去噪以消除批处理效应对于数据集成至关重要,确保生物问题的准确解释和全面分析。尽管存在许多scRNA-seq数据整合方法,大多数是低效的和/或不利于下游分析。这里,DeepBID,一种新颖的基于深度学习的批量效应校正方法,非线性降维,嵌入,和细胞集群同时发生,是介绍的。DeepBID利用具有双Kullback-Leibler发散损失函数的基于负二项式的自动编码器,在一致的低维潜在空间内对齐来自不同批次的细胞点,并通过迭代聚类逐渐减轻批次效应。对多批次scRNA-seq数据集的广泛验证表明,DeepBID在消除批次效应和实现卓越的聚类准确性方面超越了现有工具。当整合来自阿尔茨海默病患者的多个scRNA-seq数据集时,DeepBID显著改善细胞聚类,有效地注释未识别的细胞,并检测细胞特异性差异表达基因。
    Numerous single-cell transcriptomic datasets from identical tissues or cell lines are generated from different laboratories or single-cell RNA sequencing (scRNA-seq) protocols. The denoising of these datasets to eliminate batch effects is crucial for data integration, ensuring accurate interpretation and comprehensive analysis of biological questions. Although many scRNA-seq data integration methods exist, most are inefficient and/or not conducive to downstream analysis. Here, DeepBID, a novel deep learning-based method for batch effect correction, non-linear dimensionality reduction, embedding, and cell clustering concurrently, is introduced. DeepBID utilizes a negative binomial-based autoencoder with dual Kullback-Leibler divergence loss functions, aligning cell points from different batches within a consistent low-dimensional latent space and progressively mitigating batch effects through iterative clustering. Extensive validation on multiple-batch scRNA-seq datasets demonstrates that DeepBID surpasses existing tools in removing batch effects and achieving superior clustering accuracy. When integrating multiple scRNA-seq datasets from patients with Alzheimer\'s disease, DeepBID significantly improves cell clustering, effectively annotating unidentified cells, and detecting cell-specific differentially expressed genes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究采用文献计量和视觉分析来阐明自闭症谱系障碍(ASD)生物标志物的全球研究趋势,确定关键的研究重点,并讨论了不同生物标志物模式的潜在整合,以进行精确的ASD评估。
    使用WebofScienceCoreCollection数据库中的数据进行了全面的文献计量分析,直到2022年12月31日。可视化工具,包括R,VOSviewer,CiteSpace,和gcluto,被用来检查协作网络,共同引文模式,和国家之间的关键词关联,机构,作者,期刊,文件,和关键词。
    ASD生物标志物研究出现于2004年,到2022年12月31日积累了4348个文档的语料库。美国,拥有1574份出版物和213的H指数,成为最多产和最具影响力的国家。加州大学,戴维斯,以346份出版物和69的H指数做出了重大贡献,使其成为领先的机构。关于期刊,自闭症和发育障碍杂志,自闭症研究,在总共1140种学术期刊中,PLOSONE是ASD生物标志物相关文章的前三名出版商。共引和关键词分析揭示了遗传学的研究热点,成像,氧化应激,神经炎症,肠道菌群,和眼动追踪。新兴主题包括“DNA甲基化,\"\"眼动追踪,代谢组学,“和”静息状态功能磁共振成像。\"
    ASD生物标志物研究领域正在动态发展。未来的努力应该优先考虑个人分层,方法标准化,生物标志物模式的和谐整合,和纵向研究,以提高ASD诊断和治疗的准确性。
    UNASSIGNED: This study employs bibliometric and visual analysis to elucidate global research trends in Autism Spectrum Disorder (ASD) biomarkers, identify critical research focal points, and discuss the potential integration of diverse biomarker modalities for precise ASD assessment.
    UNASSIGNED: A comprehensive bibliometric analysis was conducted using data from the Web of Science Core Collection database until December 31, 2022. Visualization tools, including R, VOSviewer, CiteSpace, and gCLUTO, were utilized to examine collaborative networks, co-citation patterns, and keyword associations among countries, institutions, authors, journals, documents, and keywords.
    UNASSIGNED: ASD biomarker research emerged in 2004, accumulating a corpus of 4348 documents by December 31, 2022. The United States, with 1574 publications and an H-index of 213, emerged as the most prolific and influential country. The University of California, Davis, contributed significantly with 346 publications and an H-index of 69, making it the leading institution. Concerning journals, the Journal of Autism and Developmental Disorders, Autism Research, and PLOS ONE were the top three publishers of ASD biomarker-related articles among a total of 1140 academic journals. Co-citation and keyword analyses revealed research hotspots in genetics, imaging, oxidative stress, neuroinflammation, gut microbiota, and eye tracking. Emerging topics included \"DNA methylation,\" \"eye tracking,\" \"metabolomics,\" and \"resting-state fMRI.\"
    UNASSIGNED: The field of ASD biomarker research is dynamically evolving. Future endeavors should prioritize individual stratification, methodological standardization, the harmonious integration of biomarker modalities, and longitudinal studies to advance the precision of ASD diagnosis and treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    高通量蛋白质组学分析技术的最新进展有助于同时对多个样本中的多种蛋白质进行精确定量。研究人员有机会全面分析大量医学标本或疾病模式细胞系中的分子特征。随着数据分析和集成的进步,蛋白质组学数据可以有效地巩固和利用,以识别精确的基本分子机制和解码个体生物标志物,指导肿瘤的精准治疗。在这里,我们回顾了广泛的蛋白质组学技术以及蛋白质组学数据整合的进展和方法,并进一步讨论了如何在精准医学和临床环境中更好地合并蛋白质组学。
    Recent advances in high-throughput proteomic profiling technologies have facilitated the precise quantification of numerous proteins across multiple specimens concurrently. Researchers have the opportunity to comprehensively analyze the molecular signatures in plentiful medical specimens or disease pattern cell lines. Along with advances in data analysis and integration, proteomics data could be efficiently consolidated and employed to recognize precise elementary molecular mechanisms and decode individual biomarkers, guiding the precision treatment of tumors. Herein, we review a broad array of proteomics technologies and the progress and methods for the integration of proteomics data and further discuss how to better merge proteomics in precision medicine and clinical settings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因组数据是揭示高等植物系统复杂性的宝贵资源,包括物种内部和物种之间的组成元素。通过基因组数据归档的各种努力,综合分析和增值管理,国家基因组学数据中心(NGDC),它是中国国家生物信息中心(CNCB)的一部分,已成功建立并维护了大量的数据库资源。NGDC的这一专门计划促进了一个数据丰富的生态系统,大大加强和支持基因组研究工作。这里,我们提供了专门用于存档的中央存储库的全面概述,介绍,分享植物组学数据,介绍专注于变异或基于基因的功能见解的知识库,突出特定物种的多个组学数据库资源,并简要回顾在线应用工具。我们打算将此评论用作植物研究人员的指导图,这些研究人员希望从NGDC中为其特定研究领域选择有效的数据资源。
    在线版本包含补充材料,可在10.1007/s42994-023-00134-4获得。
    Genomic data serve as an invaluable resource for unraveling the intricacies of the higher plant systems, including the constituent elements within and among species. Through various efforts in genomic data archiving, integrative analysis and value-added curation, the National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), has successfully established and currently maintains a vast amount of database resources. This dedicated initiative of the NGDC facilitates a data-rich ecosystem that greatly strengthens and supports genomic research efforts. Here, we present a comprehensive overview of central repositories dedicated to archiving, presenting, and sharing plant omics data, introduce knowledgebases focused on variants or gene-based functional insights, highlight species-specific multiple omics database resources, and briefly review the online application tools. We intend that this review can be used as a guide map for plant researchers wishing to select effective data resources from the NGDC for their specific areas of study.
    UNASSIGNED: The online version contains supplementary material available at 10.1007/s42994-023-00134-4.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号