multi-omics data integration

  • 文章类型: Journal Article
    背景:衰老是脑小血管病发生的重要危险因素,与白质(WM)病变相关,以及与年龄相关的认知改变,尽管确切的机制在很大程度上仍然未知。本研究旨在调查多基因风险评分(PRS)对WM完整性的影响。与年龄相关的DNA甲基化,和基因表达改变,关于横断面健康老龄化队列中的认知衰老。使用全基因组关联研究(GWAS)汇总统计来计算WM完整性的磁共振成像(MRI)标记,包括WM超强度,分数各向异性(FA),和平均扩散系数(MD)。这些分数用于预测与年龄相关的认知变化,并评估其与大脑结构变化的相关性。区分认知分数较高和较低的个体。为了减少数据的维度并识别与年龄相关的DNA甲基化和转录组改变,使用稀疏偏最小二乘判别分析(sPLS-DA)。随后,使用典型相关算法来整合三种类型的组学数据(PRS,DNA甲基化,和基因表达数据),并确定一个个体“组学”签名,以区分具有不同认知特征的受试者。
    结果:我们发现MD-PRS与长期记忆呈正相关,以及MD-PRS与大脑结构变化之间的相关性,有效区分记忆得分较低和较高的个体。此外,我们观察到与血管和非血管因子相关的基因中多基因信号的富集。DNA甲基化和基因表达的年龄相关改变表明参与衰老和寿命调节的关键分子特征和信号通路的失调。多组数据的整合强调了突触功能障碍的参与,轴突变性,微管组织,和认知衰老过程中的糖基化。
    结论:这些发现为WM相干性与认知老化之间关联的生物学机制提供了有价值的见解。此外,他们强调了与年龄相关的DNA甲基化和基因表达变化如何导致认知衰老.
    BACKGROUND: Aging represents a significant risk factor for the occurrence of cerebral small vessel disease, associated with white matter (WM) lesions, and to age-related cognitive alterations, though the precise mechanisms remain largely unknown. This study aimed to investigate the impact of polygenic risk scores (PRS) for WM integrity, together with age-related DNA methylation, and gene expression alterations, on cognitive aging in a cross-sectional healthy aging cohort. The PRSs were calculated using genome-wide association study (GWAS) summary statistics for magnetic resonance imaging (MRI) markers of WM integrity, including WM hyperintensities, fractional anisotropy (FA), and mean diffusivity (MD). These scores were utilized to predict age-related cognitive changes and evaluate their correlation with structural brain changes, which distinguish individuals with higher and lower cognitive scores. To reduce the dimensionality of the data and identify age-related DNA methylation and transcriptomic alterations, Sparse Partial Least Squares-Discriminant Analysis (sPLS-DA) was used. Subsequently, a canonical correlation algorithm was used to integrate the three types of omics data (PRS, DNA methylation, and gene expression data) and identify an individual \"omics\" signature that distinguishes subjects with varying cognitive profiles.
    RESULTS: We found a positive association between MD-PRS and long-term memory, as well as a correlation between MD-PRS and structural brain changes, effectively discriminating between individuals with lower and higher memory scores. Furthermore, we observed an enrichment of polygenic signals in genes related to both vascular and non-vascular factors. Age-related alterations in DNA methylation and gene expression indicated dysregulation of critical molecular features and signaling pathways involved in aging and lifespan regulation. The integration of multi-omics data underscored the involvement of synaptic dysfunction, axonal degeneration, microtubule organization, and glycosylation in the process of cognitive aging.
    CONCLUSIONS: These findings provide valuable insights into the biological mechanisms underlying the association between WM coherence and cognitive aging. Additionally, they highlight how age-associated DNA methylation and gene expression changes contribute to cognitive aging.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    RNA聚合酶II(PolII)转录延伸暂停是后生动物基因组中基因转录动态调节的组成部分。它在许多重要的生物过程和疾病进展中起着关键作用。然而,通过实验测量全基因组PolII暂停在技术上具有挑战性,并且尚未完全了解该过程背后的精确控制机制。这里,我们开发了RP3(RNA聚合酶II暂停预测),一种网络正则化逻辑回归机器学习方法,通过整合基因组序列来预测PolII暂停事件,组蛋白修饰,基因表达,染色质可及性,和蛋白质-蛋白质相互作用数据。RP3可以准确预测PolII在不同细胞环境中的暂停,并揭示与PolII暂停机制相关的转录因子。此外,我们利用前向特征选择框架系统地识别与PolII暂停相关的组蛋白修饰信号的组合.RP3可在https://github.com/AMSSwanglab/RP3免费获得。
    RNA Polymerase II (Pol II) transcriptional elongation pausing is an integral part of the dynamic regulation of gene transcription in the genome of metazoans. It plays a pivotal role in many vital biological processes and disease progression. However, experimentally measuring genome-wide Pol II pausing is technically challenging and the precise governing mechanism underlying this process is not fully understood. Here, we develop RP3 (RNA Polymerase II Pausing Prediction), a network regularized logistic regression machine learning method, to predict Pol II pausing events by integrating genome sequence, histone modification, gene expression, chromatin accessibility, and protein-protein interaction data. RP3 can accurately predict Pol II pausing in diverse cellular contexts and unveil the transcription factors that are associated with the Pol II pausing machinery. Furthermore, we utilize a forward feature selection framework to systematically identify the combination of histone modification signals associated with Pol II pausing. RP3 is freely available at https://github.com/AMSSwanglab/RP3.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们提出了一种整合全基因组多组数据的创新策略,它通过利用多任务编码器从高维组学数据中导出的隐藏层特征来促进自适应合并。对八个基准癌症数据集的经验评估证实,我们提出的框架超过了癌症亚型的比较算法,提供优越的亚型结果。在这些子类型结果的基础上,我们建立了一个强大的管道来识别全基因组生物标志物,发掘195个重要的生物标志物。此外,我们进行了详尽的分析,以评估在癌症亚型分型过程中,在全基因组水平上每个组学和非编码区特征的重要性.我们的研究表明,组学和非编码区特征都会对癌症的发展和生存预后产生重大影响。这项研究强调了整合全基因组数据在癌症研究中的潜在和实际意义。证明了全面基因组表征的效力。此外,我们的发现为采用深度学习方法的多组学分析提供了有见地的观点.
    We present an innovative strategy for integrating whole-genome-wide multi-omics data, which facilitates adaptive amalgamation by leveraging hidden layer features derived from high-dimensional omics data through a multi-task encoder. Empirical evaluations on eight benchmark cancer datasets substantiated that our proposed framework outstripped the comparative algorithms in cancer subtyping, delivering superior subtyping outcomes. Building upon these subtyping results, we establish a robust pipeline for identifying whole-genome-wide biomarkers, unearthing 195 significant biomarkers. Furthermore, we conduct an exhaustive analysis to assess the importance of each omic and non-coding region features at the whole-genome-wide level during cancer subtyping. Our investigation shows that both omics and non-coding region features substantially impact cancer development and survival prognosis. This study emphasizes the potential and practical implications of integrating genome-wide data in cancer research, demonstrating the potency of comprehensive genomic characterization. Additionally, our findings offer insightful perspectives for multi-omics analysis employing deep learning methodologies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于深度学习的多组学数据整合方法有能力揭示癌症发展的机制,发现癌症生物标志物并确定致病靶标。然而,当前的方法在整合多组学数据时忽略了样本之间的潜在相关性。此外,由于深度学习模型的复杂性,提供准确的生物学解释仍然存在重大挑战。因此,迫切需要一种基于深度学习的多组学集成方法来探索样本之间的潜在相关性并提供模型可解释性。在这里,我们提出了一种新的可解释的多组学数据整合方法(DeepKEGG),用于癌症复发预测和生物标志物发现.在DeepKEGG,根据基因/miRNAs和通路之间的生物学关系,设计了一个生物学分层模块,用于神经元节点的局部连接和模型可解释性。此外,构建路径自注意模块,探索不同样本之间的相关性,生成潜在的路径特征表示,以增强模型的预测性能。最后,一种基于归因的特征重要性计算方法用于发现与癌症复发相关的生物标志物,并提供模型的生物学解释。实验结果表明,DeepKEGG在5倍交叉验证中优于其他最新方法。此外,案例研究还表明,DeepKEGG是发现生物标志物的有效工具。该代码可在https://github.com/lanbiolab/DeepKEGG获得。
    Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:在心力衰竭(HF)患者中,伴随窦房结功能障碍(SND)是死亡率的重要预测指标,然而它的分子基础却知之甚少。使用蛋白质组学,本研究旨在在并发SND的HF动物模型中解剖窦房结内的蛋白质和磷酸化重塑。
    结果:我们在患有心力衰竭和SND的小鼠中获得了深窦房结蛋白质组和磷酸蛋白质组,并报告了广泛的重塑。将测量的(磷酸)蛋白质组变化与人类基因组学药物警戒数据相交叉,强调了与电活动有关的下调蛋白,如起搏器离子通道,Hcn4.我们使用计算机建模证实了离子通道下调对窦房结生理学的重要性。在蛋白质组学数据的指导下,我们假设炎症反应可能驱动心力衰竭SND的电生理重塑.为了支持这一点,实验诱导的炎症下调Hcn4并减慢孤立窦房结的起搏。从蛋白质组学数据中,我们确定了促炎细胞因子样蛋白半乳糖凝集素-3作为减轻这种影响的潜在靶标。的确,心力衰竭动物模型中半乳糖凝集素-3的体内抑制可预防SND。
    结论:总的来说,我们概述了心力衰竭中SND的蛋白质和磷酸化重塑,我们强调了炎症在窦房结电生理重塑中的作用,我们提出了半乳糖凝集素-3信号作为改善心力衰竭中SND的靶标。
    OBJECTIVE: In patients with heart failure (HF), concomitant sinus node dysfunction (SND) is an important predictor of mortality, yet its molecular underpinnings are poorly understood. Using proteomics, this study aimed to dissect the protein and phosphorylation remodelling within the sinus node in an animal model of HF with concurrent SND.
    RESULTS: We acquired deep sinus node proteomes and phosphoproteomes in mice with heart failure and SND and report extensive remodelling. Intersecting the measured (phospho)proteome changes with human genomics pharmacovigilance data, highlighted downregulated proteins involved in electrical activity such as the pacemaker ion channel, Hcn4. We confirmed the importance of ion channel downregulation for sinus node physiology using computer modelling. Guided by the proteomics data, we hypothesized that an inflammatory response may drive the electrophysiological remodeling underlying SND in heart failure. In support of this, experimentally induced inflammation downregulated Hcn4 and slowed pacemaking in the isolated sinus node. From the proteomics data we identified proinflammatory cytokine-like protein galectin-3 as a potential target to mitigate the effect. Indeed, in vivo suppression of galectin-3 in the animal model of heart failure prevented SND.
    CONCLUSIONS: Collectively, we outline the protein and phosphorylation remodeling of SND in heart failure, we highlight a role for inflammation in electrophysiological remodelling of the sinus node, and we present galectin-3 signalling as a target to ameliorate SND in heart failure.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    多组学数据在精准医疗中起着至关重要的作用,主要是了解不同组学之间的不同生物学相互作用。多年来,机器学习方法已被广泛用于这种背景下。这篇综述旨在全面总结和分类这些进步,专注于多组数据的整合,其中包括基因组学,转录组学,蛋白质组学和代谢组学,除了临床数据。我们讨论了用于集成不同组学数据集的各种机器学习技术和计算方法,并为其应用提供有价值的见解。这篇综述强调了多组数据集成中存在的挑战和机遇,精准医学和患者分层,为各种情况下的方法选择提供实用建议。还探讨了深度学习和基于网络的方法的最新进展,突出了它们协调不同生物信息层的潜力。此外,我们提出了在精确肿瘤学中整合多组学数据的路线图,概述优势,挑战和实施困难。因此,这篇综述提供了对当前文献的全面概述,为研究人员提供对用于患者分层的机器学习技术的见解,特别是在精确肿瘤学中。联系人:anirban@klyuniv.AC.。
    Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact:  anirban@klyuniv.ac.in.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:对乳腺癌亚型进行分类对于临床诊断和治疗至关重要。然而,乳腺癌的早期症状可能并不明显。高通量测序技术的快速发展已经导致产生大量的多组学生物学数据。利用和整合可用的多组学数据可以有效地提高识别乳腺癌亚型的准确性。然而,很少有人将精力集中在确定不同组学数据的关联来预测乳腺癌亚型.
    结果:在本文中,我们提出了一种差异稀疏典型相关分析网络(DSCCN)来对乳腺癌亚型进行分类.DSCCN对多组表达数据进行差异分析以识别差异表达(DE)基因,并采用稀疏典型相关分析(SCCA)挖掘多组DE基因之间高度相关的特征。同时,DSCCN分别使用多任务深度学习神经网络来训练相关的DE基因以预测乳腺癌亚型,它自发地解决了整合多组学数据时的数据异质性问题。
    结论:实验结果表明,通过挖掘多组数据之间的关联,DSCCN比现有方法更能够准确地分类乳腺癌亚型。
    BACKGROUND: Classifying breast cancer subtypes is crucial for clinical diagnosis and treatment. However, the early symptoms of breast cancer may not be apparent. Rapid advances in high-throughput sequencing technology have led to generating large number of multi-omics biological data. Leveraging and integrating the available multi-omics data can effectively enhance the accuracy of identifying breast cancer subtypes. However, few efforts focus on identifying the associations of different omics data to predict the breast cancer subtypes.
    RESULTS: In this paper, we propose a differential sparse canonical correlation analysis network (DSCCN) for classifying the breast cancer subtypes. DSCCN performs differential analysis on multi-omics expression data to identify differentially expressed (DE) genes and adopts sparse canonical correlation analysis (SCCA) to mine highly correlated features between multi-omics DE-genes. Meanwhile, DSCCN uses multi-task deep learning neural network separately to train the correlated DE-genes to predict breast cancer subtypes, which spontaneously tackle the data heterogeneity problem in integrating multi-omics data.
    CONCLUSIONS: The experimental results show that by mining the associations among multi-omics data, DSCCN is more capable of accurately classifying breast cancer subtypes than the existing methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着单细胞测序技术的快速发展,在不同的组学水平上解剖了各种生物过程中的细胞异质性。然而,单细胞组学导致信息碎片化,无法提供完整的细胞状态.在过去的几年里,已经开发了各种单细胞多模式组学技术来共同描述多分子模式,包括基因组,转录组,表观基因组,和蛋白质组,来自同一个单细胞。随着单细胞多模态组学数据的可用性,我们可以同时研究基因组突变或表观遗传修饰对转录和翻译的影响,并揭示疾病发病机理的潜在机制。在大量单细胞组学数据的驱动下,单细胞多组学数据的整合方法得到了迅速发展。未来将海量的多组学单细胞数据整合到公共数据库中,使我们能够在单细胞分辨率下全面了解细胞状态和基因调控。在这次审查中,我们总结了单细胞多模态组学数据的实验方法和多模态组学数据整合的计算方法。我们还讨论了该领域的未来发展。
    With the rapid advance of single-cell sequencing technology, cell heterogeneity in various biological processes was dissected at different omics levels. However, single-cell mono-omics results in fragmentation of information and could not provide complete cell states. In the past several years, a variety of single-cell multimodal omics technologies have been developed to jointly profile multiple molecular modalities, including genome, transcriptome, epigenome, and proteome, from the same single cell. With the availability of single-cell multimodal omics data, we can simultaneously investigate the effects of genomic mutation or epigenetic modification on transcription and translation, and reveal the potential mechanisms underlying disease pathogenesis. Driven by the massive single-cell omics data, the integration method of single-cell multi-omics data has rapidly developed. Integration of the massive multi-omics single-cell data in public databases in the future will make it possible to construct a cell atlas of multi-omics, enabling us to comprehensively understand cell state and gene regulation at single-cell resolution. In this review, we summarized the experimental methods for single-cell multimodal omics data and computational methods for multi-omics data integration. We also discussed the future development of this field.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    癌症是一种由多种因素调控的复杂且高死亡率的疾病。准确的癌症分型对于制定个性化治疗计划和提高患者生存率至关重要。驱动癌症进展的潜在机制可以通过分析多组学数据来全面理解。然而,组学数据中的高噪声水平通常在捕获一致的表示和充分整合其信息方面带来挑战。本文提出了一种新的基于变分自编码器的深度学习模型,名为“深度集成潜在一致表示”(DILCR)。首先,设计了多个独立的变分自动编码器和对比损失函数,以将噪声与组学数据分离,并捕获潜在的一致表示。随后,提出了一种注意力深度整合网络,以有效地整合跨不同组学级别的一致表示。此外,引入改进的DeepEmbedded聚类算法使集成变量聚类变得友好。使用来自癌症基因组图谱的10个典型癌症数据集评估了DILCR的有效性,并与14种最新的整合方法进行了比较。结果表明,DILCR有效地捕获了组学数据中的一致表示,并在癌症分型中优于其他整合方法。在肾肾透明细胞癌病例研究中,DILCR鉴定的癌症亚型具有显著的生物学意义和可解释性.
    Cancer is a complex and high-mortality disease regulated by multiple factors. Accurate cancer subtyping is crucial for formulating personalized treatment plans and improving patient survival rates. The underlying mechanisms that drive cancer progression can be comprehensively understood by analyzing multi-omics data. However, the high noise levels in omics data often pose challenges in capturing consistent representations and adequately integrating their information. This paper proposed a novel variational autoencoder-based deep learning model, named Deeply Integrating Latent Consistent Representations (DILCR). Firstly, multiple independent variational autoencoders and contrastive loss functions were designed to separate noise from omics data and capture latent consistent representations. Subsequently, an Attention Deep Integration Network was proposed to integrate consistent representations across different omics levels effectively. Additionally, we introduced the Improved Deep Embedded Clustering algorithm to make integrated variable clustering friendly. The effectiveness of DILCR was evaluated using 10 typical cancer datasets from The Cancer Genome Atlas and compared with 14 state-of-the-art integration methods. The results demonstrated that DILCR effectively captures the consistent representations in omics data and outperforms other integration methods in cancer subtyping. In the Kidney Renal Clear Cell Carcinoma case study, cancer subtypes were identified by DILCR with significant biological significance and interpretability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    中脑多巴胺能神经元(mDANs)控制自主运动,认知,和奖励行为在生理条件下,并涉及人类疾病,如帕金森病(PD)。已经描述了许多在发育过程中控制人类mDAN分化的转录因子(TF),但大部分监管环境仍未定义。使用酪氨酸羟化酶(TH)人iPSC报道系,我们在这里生成分化过程中纯化mDANs的时间序列转录组和表观基因组谱。整合分析预测mDAN分化和超增强剂的新型调节剂用于识别关键TF。我们发现LBX1,NHLH1和NR2F1/2促进mDAN分化,并显示LBX1或NHLH1的过表达也可以改善mDAN规格。对TF靶标的更详细的研究表明,NHLH1促进神经元miR-124的诱导,LBX1调节胆固醇的生物合成,和NR2F1/2控制神经元活动。
    Midbrain dopaminergic neurons (mDANs) control voluntary movement, cognition, and reward behavior under physiological conditions and are implicated in human diseases such as Parkinson\'s disease (PD). Many transcription factors (TFs) controlling human mDAN differentiation during development have been described, but much of the regulatory landscape remains undefined. Using a tyrosine hydroxylase (TH) human iPSC reporter line, we here generate time series transcriptomic and epigenomic profiles of purified mDANs during differentiation. Integrative analysis predicts novel regulators of mDAN differentiation and super-enhancers are used to identify key TFs. We find LBX1, NHLH1 and NR2F1/2 to promote mDAN differentiation and show that overexpression of either LBX1 or NHLH1 can also improve mDAN specification. A more detailed investigation of TF targets reveals that NHLH1 promotes the induction of neuronal miR-124, LBX1 regulates cholesterol biosynthesis, and NR2F1/2 controls neuronal activity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号