gene expression data

基因表达数据
  • 文章类型: Journal Article
    识别癌症的诊断生物标志物在个性化医学领域至关重要。可用的转录组和相互作用组为生物标志物筛选提供了前所未有的机遇和挑战。从系统的角度来看,基于网络的医学方法为组织可用的高通量组学数据提供了替代方法,用于破译分子相互作用及其与表型状态的关联。在这项工作中,我们提出了一种名为TopMarker的生物信息学策略,用于通过比较对照和疾病样本中的网络拓扑差异来发现诊断性生物标志物.具体来说,我们分别在控制和疾病两种状态下建立了基因-基因相互作用网络。与疾病相比,两个网络之间的网络重新布线状态会导致不同的网络拓扑,反映出正常样本的动态和变化。因此,我们利用对照和疾病基因网络之间的差异网络拓扑参数鉴定了潜在的生物标记基因。对于概念验证研究,我们介绍了肝细胞癌(HCC)中生物标志物发现的计算流程。我们证明了使用这些候选生物标志物对HCC样品进行分类的TopMarker方法的有效性,并验证了其在众多独立数据集中的特征能力。我们还比较了通过TopMarker鉴定的生物标志物基因与通过其他基线方法鉴定的生物标志物基因的判别力。较高的分类性能和功能含义表明我们提出的从差分网络拓扑中发现生物标志物的方法具有优势。
    Identifying diagnostic biomarkers for cancer is crucial in the field of personalized medicine. The available transcriptome and interactome provide unprecedented opportunities and challenges for biomarker screening. From a systematic perspective, network-based medicine methods provide alternative approaches to organizing the available high-throughput omics data for deciphering molecular interactions and their associations with phenotypic states. In this work, we propose a bioinformatics strategy named TopMarker for discovering diagnostic biomarkers by comparing the network topology differences in control and disease samples. Specifically, we build up gene-gene interaction networks in the two states of control and disease respectively. The network rewiring status across the two networks results in differential network topologies reflecting dynamics and changes in normal samples when compared with those in disease. Thus, we identify the potential biomarker genes with differential network topological parameters between the control and disease gene networks. For a proof-of-concept study, we introduce the computational pipeline of biomarker discovery in hepatocellular carcinoma (HCC). We prove the effectiveness of the proposed TopMarker method using these candidate biomarkers in classifying HCC samples and validate its signature capability across numerous independent datasets. We also compare the discriminant power of biomarker genes identified by TopMarker with those identified by other baseline methods. The higher classification performances and functional implications indicate the advantages of our proposed method for discovering biomarkers from differential network topology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:本研究旨在通过探索广泛的基因表达数据集来确定与宫颈癌相关的显著差异表达基因(DEGs),以揭示新的治疗靶点。
    方法:从基因表达综合中提取基因表达谱,癌症基因组图谱,和基因型-组织表达平台。差异表达分析鉴定了宫颈癌病例中的DEGs。实施加权基因共表达网络分析(WGCNA)以定位与疾病的临床性状密切相关的基因。机器学习算法,包括LASSO回归和随机森林算法,被用来精确定位关键基因。
    结果:本研究成功分离出与宫颈癌相关的DEGs。白细胞介素-24通过WGCNA和机器学习技术被认为是关键基因。实验验证表明,人白细胞介素(hIL)-24抑制增殖,迁移,和入侵,在促进细胞凋亡的同时,在SiHa和HeLa宫颈癌细胞中,确认其作为治疗靶标的作用。
    结论:本文采用的多数据库分析策略强调hIL-24是宫颈癌发病机制中的主要基因。研究结果表明,hIL-24是靶向治疗的有希望的候选者,为创新的治疗方式提供了一个潜在的途径。这项研究增强了对宫颈癌分子机制的理解,并有助于寻求新的肿瘤疗法。
    OBJECTIVE: This study aimed to identify significantly differentially expressed genes (DEGs) related to cervical cancer by exploring extensive gene expression datasets to unveil new therapeutic targets.
    METHODS: Gene expression profiles were extracted from the Gene Expression Omnibus, The Cancer Genome Atlas, and the Genotype-Tissue Expression platforms. A differential expression analysis identified DEGs in cervical cancer cases. Weighted gene co-expression network analysis (WGCNA) was implemented to locate genes closely linked to the clinical traits of diseases. Machine learning algorithms, including LASSO regression and the random forest algorithm, were applied to pinpoint key genes.
    RESULTS: The investigation successfully isolated DEGs pertinent to cervical cancer. Interleukin-24 was recognized as a pivotal gene via WGCNA and machine learning techniques. Experimental validations demonstrated that human interleukin (hIL)-24 inhibited proliferation, migration, and invasion, while promoting apoptosis, in SiHa and HeLa cervical cancer cells, affirming its role as a therapeutic target.
    CONCLUSIONS: The multi-database analysis strategy employed herein emphasized hIL-24 as a principal gene in cervical cancer pathogenesis. The findings suggest hIL-24 as a promising candidate for targeted therapy, offering a potential avenue for innovative treatment modalities. This study enhances the understanding of molecular mechanisms of cervical cancer and aids in the pursuit of novel oncological therapies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:非常需要开发一种计算方法来分析和利用基因表达数据中包含的信息。非负矩阵因子分解(NMF)在计算生物学中的最新应用已经证明了从大量数据(特别是基因表达微阵列)中获得必要细节的能力。NMF中的一个常见问题是找到退化演示的因素的适当数量等级(r),但是对于哪种技术最适合用于此目的,尚无共识。因此,已经提出了各种技术来选择秩分解(r)的最优值。
    目的:在这项工作中,基于肘部方法,提出了一种新的秩选择指标,将其与cophenetic度量进行了有条不紊的比较。
    方法:要确定最佳数量等级(r),这项研究集中在基因表达数据集上NMF的单位不变膝盖(UIK)方法。由于UIK方法需要一个极值距离估计器,该估计器最终用于拐点和拐点的识别,在基因表达数据集上使用UIK方法作为目标矩阵,该方法找到了所提出算法的残差平方和的曲率的第一个拐点。
    结果:使用急性淋巴细胞白血病和急性髓细胞性白血病样本的基因表达数据对UIK任务进行计算。因此,对不同算法的不同结果进行了比较。所提出的UIK方法易于执行,快,没有先验排序值输入,并且不需要显著影响模型功能的初始参数。
    结论:这项研究表明,肘部方法为基因表达数据和精确估计已知维度的模拟突变过程数据提供了可靠的预测。提出的UIK方法比传统方法更快,包括利用共识矩阵作为秩选择标准的度量,同时实现了显着更好的计算效率,而无需视觉检查曲线。最后,基于肘部法的基因表达数据的等级调整方法在理论上可以说优于cophenetic测量。
    BACKGROUND: There is a great need to develop a computational approach to analyze and exploit the information contained in gene expression data. The recent utilization of nonnegative matrix factorization (NMF) in computational biology has demonstrated the capability to derive essential details from a high amount of data in particular gene expression microarrays. A common problem in NMF is finding the proper number rank (r) of factors of the degraded demonstration, but no agreement exists on which technique is most appropriate to utilize for this purpose. Thus, various techniques have been suggested to select the optimal value of rank factorization (r).
    OBJECTIVE: In this work, a new metric for rank selection is proposed based on the elbow method, which was methodically compared against the cophenetic metric.
    METHODS: To decide the optimum number rank (r), this study focused on the unit invariant knee (UIK) method of the NMF on gene expression data sets. Since the UIK method requires an extremum distance estimator that is eventually employed for inflection and identification of a knee point, the proposed method finds the first inflection point of the curvature of the residual sum of squares of the proposed algorithms using the UIK method on gene expression data sets as a target matrix.
    RESULTS: Computation was conducted for the UIK task using gene expression data of acute lymphoblastic leukemia and acute myeloid leukemia samples. Consequently, the distinct results of NMF were subjected to comparison on different algorithms. The proposed UIK method is easy to perform, fast, free of a priori rank value input, and does not require initial parameters that significantly influence the model\'s functionality.
    CONCLUSIONS: This study demonstrates that the elbow method provides a credible prediction for both gene expression data and for precisely estimating simulated mutational processes data with known dimensions. The proposed UIK method is faster than conventional methods, including metrics utilizing the consensus matrix as a criterion for rank selection, while achieving significantly better computational efficiency without visual inspection on the curvatives. Finally, the suggested rank tuning method based on the elbow method for gene expression data is arguably theoretically superior to the cophenetic measure.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    选择适当的相似性测量方法对于获得生物学上有意义的聚类模块至关重要。常用的测量方法不足以捕获生物系统的复杂性,并且无法准确表示其复杂的相互作用。
    本研究旨在通过使用基于相似性度量方法的聚类算法来获得生物学上有意义的基因模块。
    提出了一种称为双索引最近邻相似性度量(DINNSM)的新算法。该算法使用Pearson或Spearman相关性计算基因之间的相似度矩阵。然后,它被用来构建基于相似性矩阵的最近邻表。使用共享基因在最近邻表中的位置和共享基因的数量来重建最终的相似性矩阵。
    在五个不同的基因表达数据集上进行了实验,并与五种广泛使用的基因表达数据相似性测量技术进行了比较。研究结果表明,当利用DINNSM作为相似性度量时,聚类结果比使用替代测量技术更好。
    DINNSM对基因之间复杂的生物联系提供了更准确的见解,有利于更准确和生物学的基因共表达模块的鉴定。
    UNASSIGNED: Selecting an appropriate similarity measurement method is crucial for obtaining biologically meaningful clustering modules. Commonly used measurement methods are insufficient in capturing the complexity of biological systems and fail to accurately represent their intricate interactions.
    UNASSIGNED: This study aimed to obtain biologically meaningful gene modules by using the clustering algorithm based on a similarity measurement method.
    UNASSIGNED: A new algorithm called the Dual-Index Nearest Neighbor Similarity Measure (DINNSM) was proposed. This algorithm calculated the similarity matrix between genes using Pearson\'s or Spearman\'s correlation. It was then used to construct a nearest-neighbor table based on the similarity matrix. The final similarity matrix was reconstructed using the positions of shared genes in the nearest neighbor table and the number of shared genes.
    UNASSIGNED: Experiments were conducted on five different gene expression datasets and compared with five widely used similarity measurement techniques for gene expression data. The findings demonstrate that when utilizing DINNSM as the similarity measure, the clustering results performed better than using alternative measurement techniques.
    UNASSIGNED: DINNSM provided more accurate insights into the intricate biological connections among genes, facilitating the identification of more accurate and biological gene co-expression modules.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    高通量技术在发现预后生物标志物和识别新的药物靶标方面变得越来越重要。和Mamaprint一起,OncotypeDX,和许多其他预后分子特征乳腺癌是高通量数据用于提供预后生物标志物的范例之一,可以以相当短的基因列表的形式表示。这样的基因列表可以作为对于应用于高维基因表达数据的机器学习(ML)方法的决策而言重要的一组特征(基因)来获得。一些研究已经确定了乳腺癌患者预后的预测基因列表,但是这些列表是不稳定的,只有几个共同的基因。特征选择的不稳定性阻碍了生物学可解释性:与癌症病理学相关的基因应该是针对相同临床类型的患者获得的任何预测基因列表的成员。可以通过在ML方法中包含有关分子网络的信息来改善所选特征的稳定性和可解释性。图卷积神经网络(GCNN)是一种适用于由先验知识分子网络构造的基因表达数据的当代深度学习方法。逐层相关性传播(LRP)和SHapley加法扩张(SHAP)是解释深度学习模型个体决策的方法。我们使用GCNN+LRP和GCNN+SHAP技术通过汇总各个解释来构建特征集。我们提出了一种系统和定量分析稳定性的方法,对分类性能的影响,以及所选特征集的可解释性。我们使用这种方法来比较GCNN+LRP与GCNN+SHAP以及更经典的基于ML的特征选择方法。利用大型乳腺癌基因表达数据集,我们表明,尽管使用SHAP进行特征选择在所选择的特征必须对分类性能产生影响的应用中非常有用,在所有研究的方法中,GCNN+LRP提供了最稳定(可重复)和可解释的基因列表.
    High-throughput technologies are becoming increasingly important in discovering prognostic biomarkers and in identifying novel drug targets. With Mammaprint, Oncotype DX, and many other prognostic molecular signatures breast cancer is one of the paradigmatic examples of the utility of high-throughput data to deliver prognostic biomarkers, that can be represented in a form of a rather short gene list. Such gene lists can be obtained as a set of features (genes) that are important for the decisions of a Machine Learning (ML) method applied to high-dimensional gene expression data. Several studies have identified predictive gene lists for patient prognosis in breast cancer, but these lists are unstable and have only a few genes in common. Instability of feature selection impedes biological interpretability: genes that are relevant for cancer pathology should be members of any predictive gene list obtained for the same clinical type of patients. Stability and interpretability of selected features can be improved by including information on molecular networks in ML methods. Graph Convolutional Neural Network (GCNN) is a contemporary deep learning approach applicable to gene expression data structured by a prior knowledge molecular network. Layer-wise Relevance Propagation (LRP) and SHapley Additive exPlanations (SHAP) are methods to explain individual decisions of deep learning models. We used both GCNN+LRP and GCNN+SHAP techniques to construct feature sets by aggregating individual explanations. We suggest a methodology to systematically and quantitatively analyze the stability, the impact on the classification performance, and the interpretability of the selected feature sets. We used this methodology to compare GCNN+LRP to GCNN+SHAP and to more classical ML-based feature selection approaches. Utilizing a large breast cancer gene expression dataset we show that, while feature selection with SHAP is useful in applications where selected features have to be impactful for classification performance, among all studied methods GCNN+LRP delivers the most stable (reproducible) and interpretable gene lists.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:幽门螺杆菌被认为是一种真正的人类病原体,其耐药性的上升在全球范围内引起了极大的关注。本研究旨在重建基因组尺度代谢模型(GSMM),以破译幽门螺杆菌菌株对克拉霉素和利福平的代谢能力,并鉴定新的药物靶标。
    方法:根据基因组注释数据更新幽门螺杆菌的iIT341模型,以及来自文献和数据库的生化知识。通过将克拉霉素和利福平抗性的转录组数据整合到模型中来生成上下文特异性模型。通量平衡分析用于鉴定每个菌株中的必需基因,它们被进一步优先考虑为人类的非同源物,毒力因子分析,可药用性,和广谱分析。此外,还基于转录组数据的通量变异性分析和途径富集分析研究了敏感菌株和耐药菌株之间的代谢差异.
    结果:重建的GSMM被命名为HpM485模型。途径富集和通量变异性分析表明,克拉霉素和利福平抗性菌株的核糖体途径活性降低。此外,克拉霉素耐药菌株的代谢途径活性显著下降.此外,在克拉霉素和利福平耐药菌株中只检测到23和16个必需基因,分别。根据优先级分析,环丙烷脂肪酸合酶和磷酸烯醇丙酮酸合酶被确定为克拉霉素和利福平耐药菌株的推定药物靶标,分别。
    结论:我们提出了一种可靠可靠的幽门螺杆菌代谢模型。该模型可以预测新的药物靶标以对抗耐药性,并探索幽门螺杆菌在各种条件下的代谢能力。
    BACKGROUND: Helicobacter pylori is considered a true human pathogen for which rising drug resistance constitutes a drastic concern globally. The present study aimed to reconstruct a genome-scale metabolic model (GSMM) to decipher the metabolic capability of H. pylori strains in response to clarithromycin and rifampicin along with identification of novel drug targets.
    METHODS: The iIT341 model of H. pylori was updated based on genome annotation data, and biochemical knowledge from literature and databases. Context-specific models were generated by integrating the transcriptomic data of clarithromycin and rifampicin resistance into the model. Flux balance analysis was employed for identifying essential genes in each strain, which were further prioritized upon being nonhomologs to humans, virulence factor analysis, druggability, and broad-spectrum analysis. Additionally, metabolic differences between sensitive and resistant strains were also investigated based on flux variability analysis and pathway enrichment analysis of transcriptomic data.
    RESULTS: The reconstructed GSMM was named as HpM485 model. Pathway enrichment and flux variability analyses demonstrated reduced activity in the ribosomal pathway in both clarithromycin- and rifampicin-resistant strains. Also, a significant decrease was detected in the activity of metabolic pathways of clarithromycin-resistant strain. Moreover, 23 and 16 essential genes were exclusively detected in clarithromycin- and rifampicin-resistant strains, respectively. Based on prioritization analysis, cyclopropane fatty acid synthase and phosphoenolpyruvate synthase were identified as putative drug targets in clarithromycin- and rifampicin-resistant strains, respectively.
    CONCLUSIONS: We present a robust and reliable metabolic model of H. pylori. This model can predict novel drug targets to combat drug resistance and explore the metabolic capability of H. pylori in various conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    基因表达数据通常是高维的,具有有限数量的样品,并且包含与感兴趣的疾病无关的许多特征。现有的无监督特征选择算法主要关注特征在维护数据结构中的重要性,而不考虑特征之间的冗余。确定重要特征的适当数量是另一个挑战。
    在本文中,我们提出了一种针对基因表达数据的聚类指导的无监督特征选择(CGUFS)算法,以解决这些问题。我们提出的算法对现有算法进行了三项改进。对于现有聚类算法需要人为指定聚类数量的问题,我们提出了一种自适应k值策略,通过迭代更新变化函数为每个样本分配适当的伪标签。对于现有算法未能考虑特征间冗余的问题,我们提出了一种特征分组策略来对高度冗余的特征进行分组。针对现有算法无法过滤冗余特征的问题,我们提出了一种自适应过滤策略,通过计算每个特征组的潜在有效特征和潜在冗余特征来确定要保留的特征组合。
    实验结果表明,C4.5分类器对CGUFS算法选择的最优特征的平均准确率(ACC)和matthews相关系数(MCC)指标分别达到74.37%和63.84%,分别,显著优于现有算法。
    同样,Adaboost分类器在CGUFS算法选择的最优特征上的平均ACC和MCC指数明显优于现有算法。此外,统计实验结果表明CGUFS算法与现有算法存在显著差异。
    UNASSIGNED: Gene expression data is typically high dimensional with a limited number of samples and contain many features that are unrelated to the disease of interest. Existing unsupervised feature selection algorithms primarily focus on the significance of features in maintaining the data structure while not taking into account the redundancy among features. Determining the appropriate number of significant features is another challenge.
    UNASSIGNED: In this paper, we propose a clustering-guided unsupervised feature selection (CGUFS) algorithm for gene expression data that addresses these problems. Our proposed algorithm introduces three improvements over existing algorithms. For the problem that existing clustering algorithms require artificially specifying the number of clusters, we propose an adaptive k-value strategy to assign appropriate pseudo-labels to each sample by iteratively updating a change function. For the problem that existing algorithms fail to consider the redundancy among features, we propose a feature grouping strategy to group highly redundant features. For the problem that the existing algorithms cannot filter the redundant features, we propose an adaptive filtering strategy to determine the feature combinations to be retained by calculating the potentially effective features and potentially redundant features of each feature group.
    UNASSIGNED: Experimental results show that the average accuracy (ACC) and matthews correlation coefficient (MCC) indexes of the C4.5 classifier on the optimal features selected by the CGUFS algorithm reach 74.37% and 63.84%, respectively, significantly superior to the existing algorithms.
    UNASSIGNED: Similarly, the average ACC and MCC indexes of the Adaboost classifier on the optimal features selected by the CGUFS algorithm are significantly superior to the existing algorithms. In addition, statistical experiment results show significant differences between the CGUFS algorithm and the existing algorithms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    癌症是全世界死亡的主要原因之一。癌症患者的生存分析和预测对其精准医疗具有重要意义。生存预测模型的鲁棒性和可解释性很重要,其中鲁棒性告诉模型是否已经学习了知识,可解释性意味着一个模型是否能向人类展示它所学到的东西。在本文中,我们提出了一个稳健且可解释的模型SurvConvMixer,它使用通路定制的基因表达图像和ConvMixer短期癌症,中期和长期总生存预测。使用ConvMixer,可以分别学习每个路径的表示。我们通过在绝对未训练的外部数据集上测试训练的模型来展示我们模型的鲁棒性。SurvConvMixer的可解释性取决于梯度加权类激活映射(Grad-Cam),通过它,我们可以获得途径水平的激活热图。然后进行wilcoxon秩和检验以获得统计学上显著的途径,从而揭示了模型更关注的路径。SurvConvMixer在短期内取得了卓越的性能,肺腺癌的中期和长期总生存率,肺鳞状细胞癌和皮肤黑色素瘤,外部验证测试表明,SurvConvMixer可以推广到外部数据集,因此它是可靠的。最后,我们研究了Grad-Cam生成的激活图,经过Wilcoxon秩和检验和Kaplan-Meier估计,我们发现一些生存相关的通路在SurvConvMixer中起着重要作用。
    Cancer is one of the leading causes of deaths worldwide. Survival analysis and prediction of cancer patients is of great significance for their precision medicine. The robustness and interpretability of the survival prediction models are important, where robustness tells whether a model has learned the knowledge, and interpretability means if a model can show human what it has learned. In this paper, we propose a robust and interpretable model SurvConvMixer, which uses pathways customized gene expression images and ConvMixer for cancer short-term, mid-term and long-term overall survival prediction. With ConvMixer, the representation of each pathway can be learned respectively. We show the robustness of our model by testing the trained model on absolutely untrained external datasets. The interpretability of SurvConvMixer depends on gradient-weighted class activation mapping (Grad-Cam), by which we can obtain the pathway-level activation heat map. Then wilcoxon rank-sum tests are conducted to obtain the statistically significant pathways, thereby revealing which pathways the model focuses on more. SurvConvMixer achieves remarkable performance on the short-term, mid-term and long-term overall survival of lung adenocarcinoma, lung squamous cell carcinoma and skin cutaneous melanoma, and the external validation tests show that SurvConvMixer can generalize to external datasets so that it is robust. Finally, we investigate the activation maps generated by Grad-Cam, after wilcoxon rank-sum test and Kaplan-Meier estimation, we find that some survival-related pathways play important role in SurvConvMixer.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    富集分析(EA)是从基因组规模实验中获得功能见解的常用方法。因此,已经开发了大量的EA方法,然而,从以前的研究中还不清楚哪种方法对于给定的数据集来说是最好的。以前的基准测试的主要问题包括将真实路径正确分配给测试数据集的复杂性,缺乏评价指标的一般性,通常使用单个目标途径的等级。我们在这里提供了一个广义的EA基准,并将其应用于最广泛使用的EA方法,代表当前方法的所有四类。该基准使用了来自26种疾病的DNA微阵列和RNA-Seq实验的82个精选基因表达数据集,其中只有13种是癌症。为了解决单一目标途径方法的缺点,增强敏感性评价,我们提出了疾病路径网络,其中相关的京都基因百科全书和基因组途径是相关的。我们介绍了一种通过结合灵敏度和特异性来评估途径EA的新方法,以提供EA方法的平衡评估。与基于重叠的方法相比,这种方法将网络富集分析方法确定为整体表现最好的方法。通过使用随机基因表达数据集,我们探讨了每种方法的零假设偏差,揭示了它们中的大多数产生偏斜的P值。
    Enrichment analysis (EA) is a common approach to gain functional insights from genome-scale experiments. As a consequence, a large number of EA methods have been developed, yet it is unclear from previous studies which method is the best for a given dataset. The main issues with previous benchmarks include the complexity of correctly assigning true pathways to a test dataset, and lack of generality of the evaluation metrics, for which the rank of a single target pathway is commonly used. We here provide a generalized EA benchmark and apply it to the most widely used EA methods, representing all four categories of current approaches. The benchmark employs a new set of 82 curated gene expression datasets from DNA microarray and RNA-Seq experiments for 26 diseases, of which only 13 are cancers. In order to address the shortcomings of the single target pathway approach and to enhance the sensitivity evaluation, we present the Disease Pathway Network, in which related Kyoto Encyclopedia of Genes and Genomes pathways are linked. We introduce a novel approach to evaluate pathway EA by combining sensitivity and specificity to provide a balanced evaluation of EA methods. This approach identifies Network Enrichment Analysis methods as the overall top performers compared with overlap-based methods. By using randomized gene expression datasets, we explore the null hypothesis bias of each method, revealing that most of them produce skewed P-values.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    回顾了在其他地方提出的用于重建细胞核中染色质的合理3D构型的多尺度方法,基于来自Hi-C实验的接触数据和来自ChIP-seq的其他信息的集成,RNA-seq和ChIA-PET实验。如果额外的数据来自独立实验,这种方法应该利用它们来补充可能的嘈杂,有偏见或缺少Hi-C记录。当不同的数据源相互并发时,得到的解决方案得到证实;否则,其有效性将被削弱。这里,出现了可靠性问题,需要适当选择分配给不同信息贡献的相对权重。提出了一系列实验,这些实验有助于量化该策略提供的优势和局限性。虽然精度上的优势并不总是显著的,缺少Hi-C数据的情况证明了其他信息在重建结构的高度填充段中的有效性。
    A multiscale method proposed elsewhere for reconstructing plausible 3D configurations of the chromatin in cell nuclei is recalled, based on the integration of contact data from Hi-C experiments and additional information coming from ChIP-seq, RNA-seq and ChIA-PET experiments. Provided that the additional data come from independent experiments, this kind of approach is supposed to leverage them to complement possibly noisy, biased or missing Hi-C records. When the different data sources are mutually concurrent, the resulting solutions are corroborated; otherwise, their validity would be weakened. Here, a problem of reliability arises, entailing an appropriate choice of the relative weights to be assigned to the different informational contributions. A series of experiments is presented that help to quantify the advantages and the limitations offered by this strategy. Whereas the advantages in accuracy are not always significant, the case of missing Hi-C data demonstrates the effectiveness of additional information in reconstructing the highly packed segments of the structure.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号