limma

Limma
  • 文章类型: Journal Article
    简介:随着RNA-seq技术和机器学习的进步,用机器学习模型训练来自数据库的大规模RNA-seq数据通常可以识别以前被标准线性分析方法错过的具有重要调节作用的基因。发现组织特异性基因可以提高我们对组织和基因之间关系的理解。然而,一些用于转录组数据的机器学习模型已经被部署和比较,以识别组织特异性基因,特别是对于植物。方法:在本研究中,用线性模型(Limma)处理表达式矩阵,机器学习模型(LightGBM),以及具有信息增益的深度学习模型(CNN)和基于从公共数据库获得的1,548个玉米多组织RNA-seq数据的SHAP策略,以识别组织特异性基因。在验证方面,基于基因集的k-均值聚类计算V-测量值以评估其技术互补性。此外,采用GO分析和文献检索的方法验证了这些基因的功能和研究现状。结果:基于聚类验证,卷积神经网络优于其他具有较高的V度量值0.647,表明其基因集可以覆盖尽可能多的各种组织的特定属性,而LightGBM发现了关键转录因子。三个基因集的组合产生了78个核心组织特异性基因,这些基因先前已在文献中显示具有生物学意义。讨论:由于机器学习模型的不同解释策略,确定了不同的组织特异性基因集,研究人员可以根据目标使用多种方法和策略来处理组织特异性基因集。数据类型,和计算资源。本研究为转录组数据集的大规模数据挖掘提供了比较见解,解决生物信息学数据处理中的高维和偏倚困难。
    Introduction: With the advancement of RNA-seq technology and machine learning, training large-scale RNA-seq data from databases with machine learning models can generally identify genes with important regulatory roles that were previously missed by standard linear analytic methodologies. Finding tissue-specific genes could improve our comprehension of the relationship between tissues and genes. However, few machine learning models for transcriptome data have been deployed and compared to identify tissue-specific genes, particularly for plants. Methods: In this study, an expression matrix was processed with linear models (Limma), machine learning models (LightGBM), and deep learning models (CNN) with information gain and the SHAP strategy based on 1,548 maize multi-tissue RNA-seq data obtained from a public database to identify tissue-specific genes. In terms of validation, V-measure values were computed based on k-means clustering of the gene sets to evaluate their technical complementarity. Furthermore, GO analysis and literature retrieval were used to validate the functions and research status of these genes. Results: Based on clustering validation, the convolutional neural network outperformed others with higher V-measure values as 0.647, indicating that its gene set could cover as many specific properties of various tissues as possible, whereas LightGBM discovered key transcription factors. The combination of three gene sets produced 78 core tissue-specific genes that had previously been shown in the literature to be biologically significant. Discussion: Different tissue-specific gene sets were identified due to the distinct interpretation strategy for machine learning models and researchers may use multiple methodologies and strategies for tissue-specific gene sets based on their goals, types of data, and computational resources. This study provided comparative insight for large-scale data mining of transcriptome datasets, shedding light on resolving high dimensions and bias difficulties in bioinformatics data processing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:冠状病毒(CoV)是一种新兴的人类病原体,在世界各地引起严重的急性呼吸道综合症(SARS)。早期识别SARS的生物标志物可以促进检测并降低疾病的死亡率。因此,通过集成网络分析和结构建模方法,我们旨在探索冠状病毒治疗SARS的潜在药物靶点和候选药物。方法:使用Limma对CoV感染的宿主基因(HGs)表达谱进行差异表达(DE)分析。选择高度整合的DE-CoV-HG来构建蛋白质-蛋白质相互作用(PPI)网络。结果:使用Walktrap算法高度互联的模块包括模块1(202个节点);模块2(126个节点)和模块3(121个节点)从PPI网络中检索到。MYC,HDAC9,NCOA3,CEBPB,VEGFA,BCL3,SMAD3,SMURF1,KLHL12,CBL,ERBB4和CRKL被确定为潜在的药物靶标(PDT),在CoV感染后在人类呼吸系统中高度表达。功能术语生长因子受体结合,c型凝集素受体信号,白细胞介素-1介导的信号,TAP依赖性抗原加工和通过MHCI类呈递肽抗原,刺激性T细胞受体信号,和先天免疫应答信号通路,在模块中富集了信号转导和细胞因子免疫信号通路。蛋白质-蛋白质对接结果显示ERBB4-3cLpro复合物的强结合亲和力(〜314.57kcal/mol),其被选作药物靶标。此外,分子动力学模拟表明ERBB4-3cLpro复合物的结构稳定性和灵活性。Further,Wortmannin被提议作为ERBB4的候选药物,通过抑制受体酪氨酸激酶依赖性巨噬细胞增多来控制SARS-CoV-2的发病机理。MAPK信号,和调节宿主细胞进入的NF-kb单个通路,复制,和调节宿主免疫系统。结论:我们得出的结论是,CoV药物靶标“ERBB4”和候选药物“Wortmannin”为新兴COVID-19的可能个性化治疗提供了见解。
    Background: Coronavirus (CoV) is an emerging human pathogen causing severe acute respiratory syndrome (SARS) around the world. Earlier identification of biomarkers for SARS can facilitate detection and reduce the mortality rate of the disease. Thus, by integrated network analysis and structural modeling approach, we aimed to explore the potential drug targets and the candidate drugs for coronavirus medicated SARS. Methods: Differentially expression (DE) analysis of CoV infected host genes (HGs) expression profiles was conducted by using the Limma. Highly integrated DE-CoV-HGs were selected to construct the protein-protein interaction (PPI) network.  Results: Using the Walktrap algorithm highly interconnected modules include module 1 (202 nodes); module 2 (126 nodes) and module 3 (121 nodes) modules were retrieved from the PPI network. MYC, HDAC9, NCOA3, CEBPB, VEGFA, BCL3, SMAD3, SMURF1, KLHL12, CBL, ERBB4, and CRKL were identified as potential drug targets (PDTs), which are highly expressed in the human respiratory system after CoV infection. Functional terms growth factor receptor binding, c-type lectin receptor signaling, interleukin-1 mediated signaling, TAP dependent antigen processing and presentation of peptide antigen via MHC class I, stimulatory T cell receptor signaling, and innate immune response signaling pathways, signal transduction and cytokine immune signaling pathways were enriched in the modules. Protein-protein docking results demonstrated the strong binding affinity (-314.57 kcal/mol) of the ERBB4-3cLpro complex which was selected as a drug target. In addition, molecular dynamics simulations indicated the structural stability and flexibility of the ERBB4-3cLpro complex. Further, Wortmannin was proposed as a candidate drug to ERBB4 to control SARS-CoV-2 pathogenesis through inhibit receptor tyrosine kinase-dependent macropinocytosis, MAPK signaling, and NF-kb singling pathways that regulate host cell entry, replication, and modulation of the host immune system. Conclusion: We conclude that CoV drug target \"ERBB4\" and candidate drug \"Wortmannin\" provide insights on the possible personalized therapeutics for emerging COVID-19.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Human mesenchymal stem cells (hMSCs) have the capacity to differentiate into fabricate cartilage, muscle, marrow stroma, tendon/ligament, fat, and other connective tissues, providing a potential source for tissue regeneration. The aim of this study was to find the key transcription factors (TFs), which regulated osteogenic differentiation of hMSCs. In this study, three methods were performed to find the key TFs, which included enrichment analysis, direct impact value and indirect impact value. We used the patient and public involvements (PPI) network to integrate the results of the above methods for analysis. Then, we compared the osteoblast data to the control group on days 1, 3 and 7. Finally, we found the combination of the optimal and vital 30 TFs related to osteogenic differentiation. TFs FOS, SOX9 and EP300 were commonly expressed in 3 different days in the osteogenic lineages and presented in the PPI network at relatively high degrees. Moreover, TFs CREBBP, ESR1 and EGR1 also presented high effects on the 1st, 3rd and 7th day. The constructed network gives us a more comprehensive understanding of the mechanism of osteogenesis of hMSCs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Microarray data have vastly accumulated in the past two decades. Due to the high-throughput characteristic of microarray techniques, it has transformed biological studies from specific genes to transcriptome level, and deeply boosted many fields of biological studies. While microarray offers great advantages for expression profiling, on the other hand it faces a lot challenges for computational analysis. In this chapter, we demonstrate how to perform standard analysis including data preprocessing, quality assessment, differential expression analysis, and general downstream analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Recurrent aphthous stomatitis (RAS) represents the most common chronic oral diseases with the prevalence ranges from 5% to 25% for different populations. Its pathogenesis remains poorly understood, which limits the development of effective drugs and treatment methods. In this study, we conducted systemic bioinformatics analysis of gene expression profiles from the Gene Expression Omnibus (GEO) to identify potential drug targets for RAS. We firstly downloaded the gene microarray datasets with the accession number of GSE37265 from GEO and performed robust multi-array (RMA) normalization with affy R programming package. Secondly, differential expression genes (DEGs) in RAS samples compared with control samples were identified based on limma package. Enriched gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of DEGs were obtained through the Database for Annotation, Visualization and Integrated Discovery (DAVID). Finally, protein-protein interaction (PPI) network was constructed based on the combination of HPRD and BioGrid databases. What\'s more, we identified modules of PPI network through MCODE plugin of Cytoscape for the purpose of screening of valuable targets. As a result, 915 genes were found to be significantly differential expression in RAS samples and biological processes related to immune and inflammatory response were significantly enriched in those genes. Network and module analysis identified FBXO6, ITGA4, VCAM1 and etc as valuable therapeutic targets for RAS. Finally, FBXO6, ITGA4, and VCAM1 were further confirmed by real time RT-PCR and western blot. This study should be helpful for the research and treatment of RAS.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号