CP: Systems biology

CP : 系统生物学
  • 文章类型: Journal Article
    细胞重编程,引导细胞状态之间的转换,是一种很有前途的组织修复和再生技术,最终目标是加速从疾病或伤害中恢复。要做到这一点,必须识别和操纵调节剂以控制细胞命运。我们提出了Fatecode,一种仅基于单细胞RNA测序(scRNA-seq)数据预测细胞命运调节因子的计算方法。Fatecode使用基于深度学习的分类监督自动编码器学习scRNA-seq数据的潜在表示,然后对潜在表示进行计算机扰动实验,以预测基因,当被打扰时,将改变原始细胞类型分布以增加或减少感兴趣的细胞类型的群体大小。我们使用来自机械基因调控网络模型的模拟和不同生物体的血液和大脑发育的scRNA-seq数据来评估Fatecode的性能。我们的结果表明,Fatecode可以从单细胞转录组学数据集中检测已知的细胞命运调节因子。
    Cell reprogramming, which guides the conversion between cell states, is a promising technology for tissue repair and regeneration, with the ultimate goal of accelerating recovery from diseases or injuries. To accomplish this, regulators must be identified and manipulated to control cell fate. We propose Fatecode, a computational method that predicts cell fate regulators based only on single-cell RNA sequencing (scRNA-seq) data. Fatecode learns a latent representation of the scRNA-seq data using a deep learning-based classification-supervised autoencoder and then performs in silico perturbation experiments on the latent representation to predict genes that, when perturbed, would alter the original cell type distribution to increase or decrease the population size of a cell type of interest. We assessed Fatecode\'s performance using simulations from a mechanistic gene-regulatory network model and scRNA-seq data mapping blood and brain development of different organisms. Our results suggest that Fatecode can detect known cell fate regulators from single-cell transcriptomics datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在单细胞RNA测序(scRNA-seq)研究中,细胞类型及其标记基因通常通过聚类和差异表达基因(DEG)分析来鉴定。一种常见的做法是使用替代标准选择基因,如方差和偏差,然后使用选定的基因对它们进行聚类,并通过DEG分析检测标记,假设已知细胞类型。替代标准可以错过重要的基因或选择不重要的基因,而DEG分析存在选择偏差问题。我们介绍Festem,直接选择下游聚类的细胞类型标记的统计方法。Festem区分标记基因,这些标记基因在具有簇信息的细胞中具有异质性分布。模拟和scRNA-seq应用表明,Festem可以高精度地灵敏选择标记,并能够鉴定其他方法经常错过的细胞类型。在大型肝内胆管癌数据集中,我们鉴定了不同的CD8+T细胞类型和潜在的预后标志物基因.
    In single-cell RNA sequencing (scRNA-seq) studies, cell types and their marker genes are often identified by clustering and differentially expressed gene (DEG) analysis. A common practice is to select genes using surrogate criteria such as variance and deviance, then cluster them using selected genes and detect markers by DEG analysis assuming known cell types. The surrogate criteria can miss important genes or select unimportant genes, while DEG analysis has the selection-bias problem. We present Festem, a statistical method for the direct selection of cell-type markers for downstream clustering. Festem distinguishes marker genes with heterogeneous distribution across cells that are cluster informative. Simulation and scRNA-seq applications demonstrate that Festem can sensitively select markers with high precision and enables the identification of cell types often missed by other methods. In a large intrahepatic cholangiocarcinoma dataset, we identify diverse CD8+ T cell types and potential prognostic marker genes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    从多组学数据中提取预后因素的深度学习工具最近有助于对生存结果进行个性化预测。然而,集成组学-成像-临床数据集的有限规模带来了挑战.这里,我们提出了两种生物学可解释和强大的深度学习架构,用于非小细胞肺癌(NSCLC)患者的生存预测,同时从计算机断层扫描(CT)扫描图像中学习,基因表达数据,和临床信息。拟议的模型集成了患者特定的临床,转录组,和成像数据,并纳入京都基因和基因组百科全书(KEGG)和反应组途径信息,在学习过程中增加生物学知识,以提取预后基因生物标志物和分子通路。虽然在仅130名患者的数据集上进行训练时,这两种模型都可以准确地对高风险和低风险组的患者进行分层,在稀疏自动编码器中引入交叉注意机制显着提高了性能,突出肿瘤区域和NSCLC相关基因作为潜在的生物标志物,因此在从小型成像组学临床样本中学习时提供了显着的方法学进步。
    Deep-learning tools that extract prognostic factors derived from multi-omics data have recently contributed to individualized predictions of survival outcomes. However, the limited size of integrated omics-imaging-clinical datasets poses challenges. Here, we propose two biologically interpretable and robust deep-learning architectures for survival prediction of non-small cell lung cancer (NSCLC) patients, learning simultaneously from computed tomography (CT) scan images, gene expression data, and clinical information. The proposed models integrate patient-specific clinical, transcriptomic, and imaging data and incorporate Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathway information, adding biological knowledge within the learning process to extract prognostic gene biomarkers and molecular pathways. While both models accurately stratify patients in high- and low-risk groups when trained on a dataset of only 130 patients, introducing a cross-attention mechanism in a sparse autoencoder significantly improves the performance, highlighting tumor regions and NSCLC-related genes as potential biomarkers and thus offering a significant methodological advancement when learning from small imaging-omics-clinical samples.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    单细胞转录组的基因共表达分析,旨在定义基因之间的功能关系,由于过多的dropout值而具有挑战性。这里,我们开发了一种单细胞图形高斯模型(SingleCellGGM)算法来进行单细胞基因共表达网络分析。当应用于小鼠单细胞数据集时,SingleCellGGM构建了网络,从中鉴定了具有高度显着功能富集的基因共表达模块。我们将这些模块视为基因表达程序(GEP)。这些GEP可以直接对单个细胞进行细胞类型注释,而无需细胞聚类,它们富含相应细胞功能所需的基因,有时水平超过10倍。GEP在数据集之间是保守的,并且能够在不同的研究之间进行通用的细胞类型标签转移。我们还提出了一种通过GEP平均进行单细胞分析的降维方法,提高结果的可解释性。因此,SingleCellGGM提供基于GEP的独特视角来分析单细胞转录组,并揭示不同单细胞数据集共享的生物学见解。
    Gene co-expression analysis of single-cell transcriptomes, aiming to define functional relationships between genes, is challenging due to excessive dropout values. Here, we developed a single-cell graphical Gaussian model (SingleCellGGM) algorithm to conduct single-cell gene co-expression network analysis. When applied to mouse single-cell datasets, SingleCellGGM constructed networks from which gene co-expression modules with highly significant functional enrichment were identified. We considered the modules as gene expression programs (GEPs). These GEPs enable direct cell-type annotation of individual cells without cell clustering, and they are enriched with genes required for the functions of the corresponding cells, sometimes at levels greater than 10-fold. The GEPs are conserved across datasets and enable universal cell-type label transfer across different studies. We also proposed a dimension-reduction method through averaging by GEPs for single-cell analysis, enhancing the interpretability of results. Thus, SingleCellGGM offers a unique GEP-based perspective to analyze single-cell transcriptomes and reveals biological insights shared by different single-cell datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    肿瘤的细胞成分及其微环境在肿瘤的进展中起着关键作用,患者生存,以及对癌症治疗的反应。通过单细胞RNA测序(scRNA-seq)数据在大量肿瘤中揭示全面的细胞特征至关重要,因为它揭示了固有的肿瘤细胞特征,这些特征无法通过传统的癌症亚型方法进行识别。我们的贡献,scBeacon,是一种工具,通过整合和聚类多个scRNA-seq数据集来提取用于在批量样本上对不相关的肿瘤数据集进行去卷积的签名,从而得出细胞类型签名。通过在癌症基因组图谱(TCGA)队列中使用scBeacon,我们发现特定肿瘤类别中的细胞和分子属性,许多与患者结果相关。我们开发了肿瘤细胞类型图,以基于细胞类型推断直观地描绘TCGA样品之间的关系。
    The cellular components of tumors and their microenvironment play pivotal roles in tumor progression, patient survival, and the response to cancer treatments. Unveiling a comprehensive cellular profile within bulk tumors via single-cell RNA sequencing (scRNA-seq) data is crucial, as it unveils intrinsic tumor cellular traits that elude identification through conventional cancer subtyping methods. Our contribution, scBeacon, is a tool that derives cell-type signatures by integrating and clustering multiple scRNA-seq datasets to extract signatures for deconvolving unrelated tumor datasets on bulk samples. Through the employment of scBeacon on the The Cancer Genome Atlas (TCGA) cohort, we find cellular and molecular attributes within specific tumor categories, many with patient outcome relevance. We developed a tumor cell-type map to visually depict the relationships among TCGA samples based on the cell-type inferences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    未知原发癌(CUP)代表转移性癌症,尽管有标准的诊断程序,原发部位仍未被识别。为了确定这种情况下的肿瘤起源,我们开发了BPformer,一种深度学习方法,将变压器模型与生物路径的先验知识相结合。对来自32种癌症类型的10,410种原发性肿瘤的转录组进行了培训,BPformer取得了94%的显著准确率,92%,89%在原发肿瘤和转移性肿瘤的原发和转移部位,分别,超越现有方法。此外,BPformer在一项回顾性研究中得到了验证,与通过免疫组织化学和组织病理学诊断的肿瘤部位一致。此外,BPformer能够根据它们对肿瘤起源鉴定的贡献对通路进行排序,这有助于将致癌信号传导途径分类为在不同癌症中高度保守的那些,而不是根据其起源高度可变的那些。
    Cancer of unknown primary (CUP) represents metastatic cancer where the primary site remains unidentified despite standard diagnostic procedures. To determine the tumor origin in such cases, we developed BPformer, a deep learning method integrating the transformer model with prior knowledge of biological pathways. Trained on transcriptomes from 10,410 primary tumors across 32 cancer types, BPformer achieved remarkable accuracy rates of 94%, 92%, and 89% in primary tumors and primary and metastatic sites of metastatic tumors, respectively, surpassing existing methods. Additionally, BPformer was validated in a retrospective study, demonstrating consistency with tumor sites diagnosed through immunohistochemistry and histopathology. Furthermore, BPformer was able to rank pathways based on their contribution to tumor origin identification, which helped to classify oncogenic signaling pathways into those that are highly conservative among different cancers versus those that are highly variable depending on their origins.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    无浆细胞DNA(cfDNA)片段化模式是癌症液体活检中具有高度翻译意义的新兴方向。传统上,将cfDNA测序读数与参考基因组进行比对以提取它们的片段组特征。在这项研究中,通过在相同的数据集上并行使用不同的参考基因组的cfDNA片段组学分析,我们报告了这种传统的基于参考的方法存在系统偏差。cfDNA片段组特征的偏差在种族之间以样品依赖性方式变化,因此可能会对多个临床中心的癌症诊断测定的性能产生不利影响。此外,为了规避分析偏见,我们主要发展,cfDNA片段组学分析的无参考方法。Freefly的运行速度比传统的基于参考的方法快60倍,同时产生高度一致的结果。此外,Freefly报道的cfDNA片段组学特征可直接用于癌症诊断。因此,Freefly对cfDNA片段组学的快速无偏测量具有翻译价值。
    Plasma cell-free DNA (cfDNA) fragmentation patterns are emerging directions in cancer liquid biopsy with high translational significance. Conventionally, the cfDNA sequencing reads are aligned to a reference genome to extract their fragmentomic features. In this study, through cfDNA fragmentomics profiling using different reference genomes on the same datasets in parallel, we report systematic biases in such conventional reference-based approaches. The biases in cfDNA fragmentomic features vary among races in a sample-dependent manner and therefore might adversely affect the performances of cancer diagnosis assays across multiple clinical centers. In addition, to circumvent the analytical biases, we develop Freefly, a reference-free approach for cfDNA fragmentomics profiling. Freefly runs ∼60-fold faster than the conventional reference-based approach while generating highly consistent results. Moreover, cfDNA fragmentomic features reported by Freefly can be directly used for cancer diagnosis. Hence, Freefly possesses translational merit toward the rapid and unbiased measurement of cfDNA fragmentomics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    单细胞RNA测序(scRNA-seq)改变了我们对细胞对扰动(如治疗干预和疫苗)的反应的理解。与这种扰动的基因相关性通常通过差异表达分析(DEA)来评估,它提供了转录组景观的一维视图。该方法潜在地忽略了具有适度表达变化但深刻下游影响的基因,并且易受假阳性的影响。我们介绍了GENIX(基因表达网络重要性检查),通过构建基因关联网络并采用基于网络的比较模型来识别拓扑特征基因,从而超越DEA的计算框架。我们使用合成和实验数据集对GENIX进行基准测试,包括分析流感疫苗诱导的COVID-19患者外周血单核细胞(PBMC)的免疫反应。GENIX成功地模拟了生物网络的关键特征,并揭示了经典DEA遗漏的特征基因,从而拓宽了精准医学中目标基因发现的范围。
    Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of cellular responses to perturbations such as therapeutic interventions and vaccines. Gene relevance to such perturbations is often assessed through differential expression analysis (DEA), which offers a one-dimensional view of the transcriptomic landscape. This method potentially overlooks genes with modest expression changes but profound downstream effects and is susceptible to false positives. We present GENIX (gene expression network importance examination), a computational framework that transcends DEA by constructing gene association networks and employing a network-based comparative model to identify topological signature genes. We benchmark GENIX using both synthetic and experimental datasets, including analysis of influenza vaccine-induced immune responses in peripheral blood mononuclear cells (PBMCs) from recovered COVID-19 patients. GENIX successfully emulates key characteristics of biological networks and reveals signature genes that are missed by classical DEA, thereby broadening the scope of target gene discovery in precision medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们提出了一种整合全基因组多组数据的创新策略,它通过利用多任务编码器从高维组学数据中导出的隐藏层特征来促进自适应合并。对八个基准癌症数据集的经验评估证实,我们提出的框架超过了癌症亚型的比较算法,提供优越的亚型结果。在这些子类型结果的基础上,我们建立了一个强大的管道来识别全基因组生物标志物,发掘195个重要的生物标志物。此外,我们进行了详尽的分析,以评估在癌症亚型分型过程中,在全基因组水平上每个组学和非编码区特征的重要性.我们的研究表明,组学和非编码区特征都会对癌症的发展和生存预后产生重大影响。这项研究强调了整合全基因组数据在癌症研究中的潜在和实际意义。证明了全面基因组表征的效力。此外,我们的发现为采用深度学习方法的多组学分析提供了有见地的观点.
    We present an innovative strategy for integrating whole-genome-wide multi-omics data, which facilitates adaptive amalgamation by leveraging hidden layer features derived from high-dimensional omics data through a multi-task encoder. Empirical evaluations on eight benchmark cancer datasets substantiated that our proposed framework outstripped the comparative algorithms in cancer subtyping, delivering superior subtyping outcomes. Building upon these subtyping results, we establish a robust pipeline for identifying whole-genome-wide biomarkers, unearthing 195 significant biomarkers. Furthermore, we conduct an exhaustive analysis to assess the importance of each omic and non-coding region features at the whole-genome-wide level during cancer subtyping. Our investigation shows that both omics and non-coding region features substantially impact cancer development and survival prognosis. This study emphasizes the potential and practical implications of integrating genome-wide data in cancer research, demonstrating the potency of comprehensive genomic characterization. Additionally, our findings offer insightful perspectives for multi-omics analysis employing deep learning methodologies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    预测细胞对扰动的反应需要对分子调节动力学的可解释见解,以进行可靠的细胞命运控制。尽管潜在相互作用的混杂非线性。人们对开发基于机器学习的扰动响应预测模型以处理扰动数据的非线性越来越感兴趣,但是他们在分子调节动力学方面的解释仍然是一个挑战。或者,为了有意义的生物学解释,布尔网络等逻辑网络模型在系统生物学中广泛用于表示细胞内分子调控。然而,由于高维和不连续的搜索空间,确定大规模网络的适当监管逻辑仍然是一个障碍。为了应对这些挑战,我们提出了一个可扩展的无导数优化器,通过元强化学习为布尔网络模型训练。经过训练的优化器优化的逻辑网络模型成功预测癌细胞系的抗癌药物反应,同时深入了解其潜在的分子调控机制。
    Predicting cellular responses to perturbations requires interpretable insights into molecular regulatory dynamics to perform reliable cell fate control, despite the confounding non-linearity of the underlying interactions. There is a growing interest in developing machine learning-based perturbation response prediction models to handle the non-linearity of perturbation data, but their interpretation in terms of molecular regulatory dynamics remains a challenge. Alternatively, for meaningful biological interpretation, logical network models such as Boolean networks are widely used in systems biology to represent intracellular molecular regulation. However, determining the appropriate regulatory logic of large-scale networks remains an obstacle due to the high-dimensional and discontinuous search space. To tackle these challenges, we present a scalable derivative-free optimizer trained by meta-reinforcement learning for Boolean network models. The logical network model optimized by the trained optimizer successfully predicts anti-cancer drug responses of cancer cell lines, while simultaneously providing insight into their underlying molecular regulatory mechanisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号