Essential genes

必需基因
  • 文章类型: Journal Article
    弗氏根瘤菌CCBAU45436是一种优良的根瘤菌,在农业生产中发挥着重要作用。然而,目前仍需要更全面的了解血吸虫CCBAU45436的代谢系统,这阻碍了其在农业中的应用。因此,基于第一代代谢模型iCC541,我们开发了一个新的基因组尺度代谢模型iAQY970,其中包含970个基因,1052个反应,942个代谢物,并且在MEMOTE测试中得分89%。iAQY970预测的细胞生长表型与实验数据一致为81.7%。将自由生活和共生条件下的蛋白质组数据映射到模型中的结果表明,对数期的生物量生产率快于稳定期,栽培大豆寄生根瘤菌的固氮效率高于野生型大豆,这与实际情况是一致的。在共生条件下,有184个基因会影响生长,其中94个是必不可少的;在自由生活条件下,有143个影响生长的基因,其中78是必不可少的。其中,在共生条件下的94个必需基因中,86个与iCC541的预测一致,44个必需基因得到文献信息的证实;同时,通过DEG鉴定了30个基因,通过Geptop鉴定了33个基因。此外,我们从模型中提取了四个关键的固氮模块,并预测亚硫酸盐还原酶(EC1.8.7.1)和固氮酶(EC1.18.6.1)作为增强MOMA固氮的靶酶,这为应变优化提供了潜在的焦点。通过综合代谢模型,我们可以更好地了解血吸虫CCBAU45436的代谢能力,并在未来充分利用它。
    Sinorhizobium fredii CCBAU45436 is an excellent rhizobium that plays an important role in agricultural production. However, there still needs more comprehensive understanding of the metabolic system of S. fredii CCBAU45436, which hinders its application in agriculture. Therefore, based on the first-generation metabolic model iCC541 we developed a new genome-scale metabolic model iAQY970, which contains 970 genes, 1,052 reactions, 942 metabolites and is scored 89% in the MEMOTE test. Cell growth phenotype predicted by iAQY970 is 81.7% consistent with the experimental data. The results of mapping the proteome data under free-living and symbiosis conditions to the model showed that the biomass production rate in the logarithmic phase was faster than that in the stable phase, and the nitrogen fixation efficiency of rhizobia parasitized in cultivated soybean was higher than that in wild-type soybean, which was consistent with the actual situation. In the symbiotic condition, there are 184 genes that would affect growth, of which 94 are essential; In the free-living condition, there are 143 genes that influence growth, of which 78 are essential. Among them, 86 of the 94 essential genes in the symbiotic condition were consistent with the prediction of iCC541, and 44 essential genes were confirmed by literature information; meanwhile, 30 genes were identified by DEG and 33 genes were identified by Geptop. In addition, we extracted four key nitrogen fixation modules from the model and predicted that sulfite reductase (EC 1.8.7.1) and nitrogenase (EC 1.18.6.1) as the target enzymes to enhance nitrogen fixation by MOMA, which provided a potential focus for strain optimization. Through the comprehensive metabolic model, we can better understand the metabolic capabilities of S. fredii CCBAU45436 and make full use of it in the future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:必需基因编码的功能在生物体的生命活动中起着至关重要的作用,包括增长,发展,免疫系统功能,和细胞结构维护。传统的鉴定必需基因的实验技术是资源密集型和耗时的,当前机器学习模型的准确性需要进一步提高。因此,开发一个稳健的计算模型来准确预测必需基因是至关重要的。
    结果:在这项研究中,我们介绍GCNN-SFM,用于识别生物体中必需基因的计算模型,基于图卷积神经网络(GCNN)。GCNN-SFM集成了一个图卷积层,卷积层,和一个完全连接的层,用于从必需基因的基因序列中建模和提取特征。最初,使用编码技术将基因序列转化为特征图。随后,多层GCN用于执行图卷积运算,有效地捕获基因序列的局部和全局特征。进行进一步的特征提取,然后整合卷积层和完全连接层,以生成必需基因的预测结果。利用梯度下降算法迭代更新交叉熵损失函数,提高了预测结果的准确性。同时,模型参数进行调整,以确定在训练过程中产生最佳预测性能的最佳参数组合。
    结论:实验评估表明,GCNN-SFM超越了各种高级必需基因预测模型,平均准确率为94.53%。这项研究提出了一种新的和有效的方法来识别必需基因,这对生物学和基因组学研究具有重要意义。
    BACKGROUND: Essential genes encode functions that play a vital role in the life activities of organisms, encompassing growth, development, immune system functioning, and cell structure maintenance. Conventional experimental techniques for identifying essential genes are resource-intensive and time-consuming, and the accuracy of current machine learning models needs further enhancement. Therefore, it is crucial to develop a robust computational model to accurately predict essential genes.
    RESULTS: In this study, we introduce GCNN-SFM, a computational model for identifying essential genes in organisms, based on graph convolutional neural networks (GCNN). GCNN-SFM integrates a graph convolutional layer, a convolutional layer, and a fully connected layer to model and extract features from gene sequences of essential genes. Initially, the gene sequence is transformed into a feature map using coding techniques. Subsequently, a multi-layer GCN is employed to perform graph convolution operations, effectively capturing both local and global features of the gene sequence. Further feature extraction is performed, followed by integrating convolution and fully-connected layers to generate prediction results for essential genes. The gradient descent algorithm is utilized to iteratively update the cross-entropy loss function, thereby enhancing the accuracy of the prediction results. Meanwhile, model parameters are tuned to determine the optimal parameter combination that yields the best prediction performance during training.
    CONCLUSIONS: Experimental evaluation demonstrates that GCNN-SFM surpasses various advanced essential gene prediction models and achieves an average accuracy of 94.53%. This study presents a novel and effective approach for identifying essential genes, which has significant implications for biology and genomics research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    尽管细菌细胞壁在维持细胞形状中起着关键作用,某些环境压力可以诱导许多细菌物种转变为称为L型的壁缺陷状态。长期诱导的大肠杆菌L型失去其杆状,通常具有影响细胞分裂和生长的重要突变。除此之外,L型细菌的遗传背景仍然知之甚少。在这项研究中,对两种稳定的大肠杆菌L型菌株(NC-7和LWF+)的基因组进行测序,确定其基因突变状态,并与其亲本菌株进行比较.两种L型之间的比较基因组分析揭示了独特的适应和常见的突变基因,其中许多属于不参与细胞壁生物合成的必需基因类别,表明L型遗传适应影响关键的代谢途径。使用优化的DeepSequence管道平行分析了L型和Lenski长期进化实验(LTEE)的错义变体,以研究预测的突变对蛋白质功能的影响(α)。我们报告,分析的两个L型菌株在突变的必需基因中显示出6%-10%(LTEE为0%)的频率,其中错义变体对蛋白质功能具有实质性影响(α<0.5)。这表明在适应细胞壁缺陷期间,通过必需基因的变化,L型中出现了不同的生存策略。总的来说,我们的研究结果揭示了两种大肠杆菌L型的详细遗传背景,并为进一步研究L型细菌模型中的基因功能铺平了道路.
    Despite the critical role of bacterial cell walls in maintaining cell shapes, certain environmental stressors can induce the transition of many bacterial species into a wall-deficient state called L-form. Long-term induced Escherichia coli L-forms lose their rod shape and usually hold significant mutations that affect cell division and growth. Besides this, the genetic background of L-form bacteria is still poorly understood. In the present study, the genomes of two stable L-form strains of E. coli (NC-7 and LWF+) were sequenced and their gene mutation status was determined and compared with their parental strains. Comparative genomic analysis between two L-forms reveals both unique adaptions and common mutated genes, many of which belong to essential gene categories not involved in cell wall biosynthesis, indicating that L-form genetic adaptation impacts crucial metabolic pathways. Missense variants from L-forms and Lenski\'s long-term evolution experiment (LTEE) were analyzed in parallel using an optimized DeepSequence pipeline to investigate predicted mutation effects (α) on protein functions. We report that the two L-form strains analyzed display a frequency of 6-10% (0% for LTEE) in mutated essential genes where the missense variants have substantial impact on protein functions (α<0.5). This indicates the emergence of different survival strategies in L-forms through changes in essential genes during adaptions to cell wall deficiency. Collectively, our results shed light on the detailed genetic background of two E. coli L-forms and pave the way for further investigations of the gene functions in L-form bacterial models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    具有流线型基因组的细菌,拥有必需代谢网络的全功能基因,能够更有效地合成所需的产品,因此在工业应用中具有作为生产平台的优势。为了获得简化的底盘基因组,已经做出了大量努力来减少现有的细菌基因组。这项工作分为两类:理性还原和随机还原。在过去的几十年中,必需基因集的鉴定和各种基因组缺失技术的出现极大地促进了许多细菌的基因组减少。一些构建的基因组具有工业应用所需的特性,例如:增加基因组稳定性,改造能力,细胞生长,和生物材料生产力。一些基因组减少的菌株的生理表型的减少的生长和扰动可能限制它们作为优化的细胞工厂的应用。这篇综述评估了迄今为止在减少细菌基因组以构建合成生物学的最佳底盘方面取得的进展。包括:基本基因集的识别,基因组缺失技术,人工流线型基因组的性质和工业应用,在构建简化的基因组时遇到的障碍,和未来的前景。
    Bacteria with streamlined genomes, that harbor full functional genes for essential metabolic networks, are able to synthesize the desired products more effectively and thus have advantages as production platforms in industrial applications. To obtain streamlined chassis genomes, a large amount of effort has been made to reduce existing bacterial genomes. This work falls into two categories: rational and random reduction. The identification of essential gene sets and the emergence of various genome-deletion techniques have greatly promoted genome reduction in many bacteria over the past few decades. Some of the constructed genomes possessed desirable properties for industrial applications, such as: increased genome stability, transformation capacity, cell growth, and biomaterial productivity. The decreased growth and perturbations in physiological phenotype of some genome-reduced strains may limit their applications as optimized cell factories. This review presents an assessment of the advancements made to date in bacterial genome reduction to construct optimal chassis for synthetic biology, including: the identification of essential gene sets, the genome-deletion techniques, the properties and industrial applications of artificially streamlined genomes, the obstacles encountered in constructing reduced genomes, and the future perspectives.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:据报道,骨关节炎可以促进乳腺癌(BC)的进展。
    目的:本研究旨在寻找与乳腺癌(BC)和骨关节炎(OA)相关的必需基因,探讨上皮间质转化(EMT)相关基因与两种疾病的关系,并确定候选药物。
    方法:通过文本挖掘确定与BC和OA相关的基因。进行蛋白质-蛋白质相互作用(PPI)分析,结果,发现输出的基因与EMT有关。并分析了PPI与这些基因mRNA的相关性。对这些基因进行了不同种类的富集分析。对这些基因进行了预后分析,以检查它们在不同病理阶段的表达水平。在不同的组织中,和不同的免疫细胞。药物-基因相互作用数据库用于潜在的药物发现。
    结果:总共有1422个基因被鉴定为BC和OA共有,58个基因与EMT相关。我们发现HDAC2和TGFBR1在总生存期中显著较差。HDAC2的高表达在病理分期的增加中起着至关重要的作用。四种免疫细胞可能在这个过程中发挥作用。确定了57种可能具有潜在治疗作用的药物。
    结论:EMT可能是OA影响BC的机制之一。使用这些药物可以有潜在的治疗效果,这可能使两种疾病的患者受益,并扩大药物使用的指征。
    It is documented that osteoarthritis can promote the progression of breast cancer (BC).
    This study aims to search for the essential genes associated with breast cancer (BC) and osteoarthritis (OA), explore the relationship between epithelial-mesenchymal transition (EMT)- related genes and the two diseases, and identify the candidate drugs.
    The genes related to both BC and OA were determined by text mining. Protein-protein Interaction (PPI) analysis was carried out, and as a result, the exported genes were found to be related to EMT. PPI and the correlation of mRNA of these genes were also analyzed. Different kinds of enrichment analyses were performed on these genes. A prognostic analysis was performed on these genes for examining their expression levels at different pathological stages, in different tissues, and in different immune cells. Drug-gene interaction database was employed for potential drug discovery.
    A total number of 1422 genes were identified as common to BC and OA and 58 genes were found to be related to EMT. We found that HDAC2 and TGFBR1 were significantly poor in overall survival. High expression of HDAC2 plays a vital role in the increase of pathological stages. Four immune cells might play a role in this process. Fifty-seven drugs were identified that could potentially have therapeutic effects.
    EMT may be one of the mechanisms by which OA affects BC. Using the drugs can have potential therapeutic effects, which may benefit patients with both diseases and broaden the indications for drug use.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    基因重要性定义为生命系统的生存和繁殖成功所需的基因程度。它可以在遗传背景和环境之间变化。已经对必需蛋白质编码基因进行了充分的研究。然而,很少报道非编码区的重要性。人类基因组的大多数区域不编码蛋白质。需要确定非编码基因的必要性。我们开发了iEssLnc模型,它可以为lncRNA基因分配重要性评分。据我们所知,这是对lncRNA基因重要性的首次直接定量评估。通过在lncRNA-蛋白质相互作用网络上利用图神经网络的元路径引导随机游走,iEssLnc模型可以以定量方式对必需的lncRNA基因进行全基因组筛选。我们在人类癌细胞系和小鼠基因组的背景下进行了验证和全基因组筛选。与其他方法相比,从蛋白质编码基因转移过来,iEssLnc取得了更好的表现。富集分析表明iEssLnc重要性评分聚集了具有高排名的必需lncRNA基因。根据iEssLnc模型的筛选结果,我们估计了人类和小鼠中必需的lncRNA基因的数量。我们进行了功能分析,发现必需的lncRNA基因与microRNA和细胞骨架蛋白显著相互作用,这可能对实验生命科学感兴趣。iEssLnc模型的所有数据集和代码已存储在GitHub(https://github.com/yyyZhang14/iEssLnc)中。
    Gene essentiality is defined as the extent to which a gene is required for the survival and reproductive success of a living system. It can vary between genetic backgrounds and environments. Essential protein coding genes have been well studied. However, the essentiality of non-coding regions is rarely reported. Most regions of human genome do not encode proteins. Determining essentialities of non-coding genes is demanded. We developed iEssLnc models, which can assign essentiality scores to lncRNA genes. As far as we know, this is the first direct quantitative estimation to the essentiality of lncRNA genes. By taking the advantage of graph neural network with meta-path-guided random walks on the lncRNA-protein interaction network, iEssLnc models can perform genome-wide screenings for essential lncRNA genes in a quantitative manner. We carried out validations and whole genome screening in the context of human cancer cell-lines and mouse genome. In comparisons to other methods, which are transferred from protein-coding genes, iEssLnc achieved better performances. Enrichment analysis indicated that iEssLnc essentiality scores clustered essential lncRNA genes with high ranks. With the screening results of iEssLnc models, we estimated the number of essential lncRNA genes in human and mouse. We performed functional analysis to find that essential lncRNA genes interact with microRNAs and cytoskeletal proteins significantly, which may be of interest in experimental life sciences. All datasets and codes of iEssLnc models have been deposited in GitHub (https://github.com/yyZhang14/iEssLnc).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:研究快速进化的病原体中的基因组变异可能有助于鉴定支持其“核心生物学”的基因,在场,功能和由所有菌株或“灵活生物学”表达,在菌株之间变化。支持柔性生物学的基因可以被认为是“附件”,虽然“核心”基因集可能对病原体物种生物学的共同特征很重要,包括对所有宿主基因型的毒力。小麦致病性真菌Trymoseptoria代表了对全球粮食安全发展最快的威胁之一,是本研究的重点。
    结果:我们构建了18个欧洲田间分离株的pangenome,12在感染期间也进行了RNAseq转录谱分析。结合这些数据,我们预测了一个包含9807个序列的“核心”基因集,这些序列(1)存在于所有分离株中,(2)缺乏失活多态性和(3)所有分离株表达。一个大的附属基因组,由总基因的45%组成,也被定义了。我们在染色体和个体基因尺度上对遗传和基因组多态性进行了分类。包括毒力在内的基本功能所需的蛋白质在核心基因之间具有低于平均的序列变异性。核心和附属基因组都编码了许多小的,分泌的可能与植物免疫相互作用的候选效应蛋白。病毒载体介导的短暂在植物中过表达88个候选物,未能鉴定出任何引起叶片坏死的疾病特征。然而,缺乏五个核心基因的非致病性缺失突变体的功能互补表明,通过重新引入表现出最小序列多态性和最高表达的单个基因,可以恢复完全的毒力。
    结论:这些数据支持pangenomics和转录组学的联合使用来定义代表核心基因,并且可能被利用,在快速进化的病原体的弱点。
    Studying genomic variation in rapidly evolving pathogens potentially enables identification of genes supporting their \"core biology\", being present, functional and expressed by all strains or \"flexible biology\", varying between strains. Genes supporting flexible biology may be considered to be \"accessory\", whilst the \"core\" gene set is likely to be important for common features of a pathogen species biology, including virulence on all host genotypes. The wheat-pathogenic fungus Zymoseptoria tritici represents one of the most rapidly evolving threats to global food security and was the focus of this study.
    We constructed a pangenome of 18 European field isolates, with 12 also subjected to RNAseq transcription profiling during infection. Combining this data, we predicted a \"core\" gene set comprising 9807 sequences which were (1) present in all isolates, (2) lacking inactivating polymorphisms and (3) expressed by all isolates. A large accessory genome, consisting of 45% of the total genes, was also defined. We classified genetic and genomic polymorphism at both chromosomal and individual gene scales. Proteins required for essential functions including virulence had lower-than average sequence variability amongst core genes. Both core and accessory genomes encoded many small, secreted candidate effector proteins that likely interact with plant immunity. Viral vector-mediated transient in planta overexpression of 88 candidates failed to identify any which induced leaf necrosis characteristic of disease. However, functional complementation of a non-pathogenic deletion mutant lacking five core genes demonstrated that full virulence was restored by re-introduction of the single gene exhibiting least sequence polymorphism and highest expression.
    These data support the combined use of pangenomics and transcriptomics for defining genes which represent core, and potentially exploitable, weaknesses in rapidly evolving pathogens.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    使用铜绿假单胞菌基因组中的转座子测序预测超过300个必需基因。然而,对必需基因进行反向遗传分析的方法很少。为了解决这个问题,我们开发了一个由缺失质粒整合组成的三步方案,引入温度敏感的拯救质粒,并切除整合缺失质粒,构建基于质粒的必需基因温度敏感等位基因。以PA0006为例,我们显示PA0006(Ts)在允许温度下表现出野生型细胞形态,但在限制性温度下表现出丝状形式。我们进一步表明,大肠杆菌中的甘油-甘露庚糖-二磷酸磷酸酶GmhB与PA0006p具有32.4%的同一性,并且在42°C时在功能上补充了PA0006(Ts)的缺陷。SDS-PAGE和Western印迹表明在30和42°C下PA0006(Ts)中存在和不存在完整的核心脂多糖(LPS)和B带O抗原,分别。分离的抑制子表现出野生型样细胞形态,但没有完整的核心LPS或O抗原。基因组重新测序与比较转录组分析一起鉴定了候选抑制因子果糖-二磷酸磷酸酶,其中启动子具有SNP,并且与sup中的30°C相比,转录水平在42°C未下调。进一步验证了fbp过表达在42°C下抑制PA0006(Ts)的致死性。一起来看,我们的结果表明,PA0006在调节细胞形态和核心LPS的生物合成中起作用。在铜绿假单胞菌中构建条件致死等位基因的三步方案应广泛适用于其他感兴趣的必需基因的遗传分析,包括旁路抑制性分析。重要性微生物必需基因编码细胞生长的非必需功能,因此是开发新药物的理想目标。使用转座子测序技术在基因组规模上容易地鉴定必需基因。然而,有限的方法阻碍了对重要基因的遗传分析。为了解决这个问题,我们开发了一个三步方案,用于在机会病原体铜绿假单胞菌中构建必需基因的条件等位基因。以PA0006为例,我们证明了基于质粒的PA0006(Ts)突变体在细胞形态调节方面表现出缺陷,完整核心LPS的形成,在限制性温度下而不是在允许的温度下附着O-抗原。通过自发突变分离出PA0006(Ts)的抑制子,并显示恢复的细胞形态,但未显示核心寡糖或O抗原。该方法应广泛适用于铜绿假单胞菌中其他感兴趣的必需基因的表型和抑制分析。
    Over 300 essential genes are predicted using transposon sequencing in the genome of Pseudomonas aeruginosa. However, methods for reverse genetic analysis of essential genes are scarce. To address this issue, we developed a three-step protocol consisting of integration of deletion plasmid, introduction of temperature-sensitive rescue plasmid, and excision of integrated-deletion plasmid to construct the plasmid-based temperature-sensitive allele of essential genes. Using PA0006 as an example, we showed that PA0006(Ts) exhibited wild-type cell morphology at permissive temperature but filamentous form at restrictive temperatures. We further showed that the glycerol-mannoheptose-bisphosphate phosphatase GmhB in Escherichia coli shared 32.4% identity with that of PA0006p and functionally complemented the defect of PA0006(Ts) at 42°C. SDS-PAGE and Western blotting indicated the presence and absence of the complete core lipopolysaccharide (LPS) and B-band O-antigen in PA0006(Ts) at 30 and 42°C, respectively. An isolated suppressor sup displayed wild-type-like cell morphology but no complete core LPS or O-antigen. Genome resequencing together with comparative transcriptomic profiling identified a candidate suppressor fructose-bisphosphate phosphatase in which the promoter harbored a SNP and the transcription level was not downregulated at 42°C compared to 30°C in sup. It was further validated that fbp overexpression suppressed the lethality of PA0006(Ts) at 42°C. Taken together, our results demonstrate that PA0006 plays a role in regulation of cell morphology and biosynthesis of core LPS. This three-step protocol for construction of conditional lethal allele in P. aeruginosa should be widely applicable for genetic analysis of other essential genes of interest, including analysis of bypass suppressibility. IMPORTANCE Microbial essential genes encode nondispensable function for cell growth and therefore are ideal targets for the development of new drugs. Essential genes are readily identified using transposon-sequencing technology at the genome scale. However, genetic analysis of essential genes of interest was hampered by limited methodologies. To address this issue, we developed a three-step protocol for construction of conditional allele of essential genes in the opportunistic pathogen Pseudomonas aeruginosa. Using PA0006 as an example, we demonstrated that the plasmid-based PA0006(Ts) mutant exhibited defects in regulation of cell morphology, formation of intact core LPS, and attachment of the O-antigen at restrictive temperatures but not at permissive temperatures. A suppressor of PA0006(Ts) was isolated through spontaneous mutations and showed restored cell morphology but not core oligosaccharide or O-antigen. This method should be widely applicable for phenotype and suppressibility analyses of other essential genes of interest in P. aeruginosa.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    癌症相关基因的鉴定有助于理解癌症的发病机制,开发靶向药物,创造新的诊断和治疗方法。考虑到生物实验室方法的复杂性,随着高通量数据的日益普及,已经提出了许多基于网络的方法在全球范围内鉴定癌症相关基因.一些研究集中在组织特异性癌症网络上。然而,来自不同组织的癌症可能具有共同的特征,这些方法可能忽略了在建立模型过程中不同癌症的差异和相似性。在这项工作中,为了充分利用全球信息的网络,首先通过差分网络算法建立泛癌网络,它不仅包含多种癌症类型的异构数据,还包含肿瘤样本和正常样本之间的异构数据。第二,节点表示向量通过网络嵌入学习。与基于排名分析的方法相比,在综合网络分析的帮助下,我们将癌症相关基因识别问题转化为二元分类问题。最终的结果是通过集成分类获得的。我们进一步将这些方法应用于涉及六种组织特异性癌症类型的最常用基因表达数据。因此,我们获得了一个整合的泛癌症网络和几个有生物学意义的结果.作为例子,9个基因最终被鉴定为潜在的泛癌症相关基因.这些基因中的大多数已在已发表的研究中被报道,从而显示了我们的方法在识别候选驱动基因以进一步生物实验验证方面的应用潜力。
    Identification of cancer-related genes is helpful for understanding the pathogenesis of cancer, developing targeted drugs and creating new diagnostic and therapeutic methods. Considering the complexity of the biological laboratory methods, many network-based methods have been proposed to identify cancer-related genes at the global perspective with the increasing availability of high-throughput data. Some studies have focused on the tissue-specific cancer networks. However, cancers from different tissues may share common features, and those methods may ignore the differences and similarities across cancers during the establishment of modeling. In this work, in order to make full use of global information of the network, we first establish the pan-cancer network via differential network algorithm, which not only contains heterogeneous data across multiple cancer types but also contains heterogeneous data between tumor samples and normal samples. Second, the node representation vectors are learned by network embedding. In contrast to ranking analysis-based methods, with the help of integrative network analysis, we transform the cancer-related gene identification problem into a binary classification problem. The final results are obtained via ensemble classification. We further applied these methods to the most commonly used gene expression data involving six tissue-specific cancer types. As a result, an integrative pan-cancer network and several biologically meaningful results were obtained. As examples, nine genes were ultimately identified as potential pan-cancer-related genes. Most of these genes have been reported in published studies, thus showing our method\'s potential for application in identifying driver gene candidates for further biological experimental verification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body\'s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号