Essential proteins

必需蛋白质
  • 文章类型: Journal Article
    准确识别必需蛋白质对于药物研究和疾病诊断至关重要。传统的中心性方法和机器学习方法在准确识别必需蛋白质方面经常面临挑战。主要依靠来自蛋白质-蛋白质相互作用(PPI)网络的信息。尽管一些研究人员尝试整合生物数据和PPI网络来预测必需蛋白质,设计有效的集成方法仍然是一个挑战。为了应对这些挑战,本文介绍了ACDMBI模型,专门设计来克服上述问题。ACDMBI由两个关键模块组成:特征提取和分类。在捕获相关信息方面,我们从三个不同的数据源中获得见解。最初,通过群落划分从PPI网络中提取蛋白质的结构特征。随后,这些功能使用图卷积网络(GCN)和图注意网络(GAT)进一步优化。往前走,利用双向长短期记忆网络(BiLSTM)和多头自我注意机制从基因表达数据中提取蛋白质特征。最后,蛋白质特征是通过将亚细胞定位数据映射到一维向量并通过完全连接的层进行处理而得出的。在分类阶段,我们集成了从三个不同数据源中提取的特征,构建用于蛋白质分类预测的多层深度神经网络(DNN)。酿酒酵母数据的实验结果展示了ACDMBI模型的优越性能,AUC达到0.9533,AUPR达到0.9153。消融实验进一步表明,来自不同生物信息的特征的有效整合显着提高了模型的性能。
    Accurately identifying essential proteins is vital for drug research and disease diagnosis. Traditional centrality methods and machine learning approaches often face challenges in accurately discerning essential proteins, primarily relying on information derived from protein-protein interaction (PPI) networks. Despite attempts by some researchers to integrate biological data and PPI networks for predicting essential proteins, designing effective integration methods remains a challenge. In response to these challenges, this paper presents the ACDMBI model, specifically designed to overcome the aforementioned issues. ACDMBI is comprised of two key modules: feature extraction and classification. In terms of capturing relevant information, we draw insights from three distinct data sources. Initially, structural features of proteins are extracted from the PPI network through community division. Subsequently, these features are further optimized using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Moving forward, protein features are extracted from gene expression data utilizing Bidirectional Long Short-Term Memory networks (BiLSTM) and a multi-head self-attention mechanism. Finally, protein features are derived by mapping subcellular localization data to a one-dimensional vector and processing it through fully connected layers. In the classification phase, we integrate features extracted from three different data sources, crafting a multi-layer deep neural network (DNN) for protein classification prediction. Experimental results on brewing yeast data showcase the ACDMBI model\'s superior performance, with AUC reaching 0.9533 and AUPR reaching 0.9153. Ablation experiments further reveal that the effective integration of features from diverse biological information significantly boosts the model\'s performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    蛋白质被认为是必不可少的促进生物体的生存能力,生殖能力,和其他基本生理功能。传统的生物测定的特点是持续时间延长,广泛的劳动力需求,和财务费用,以确定必需的蛋白质。因此,人们普遍认为,采用计算方法是成功识别必需蛋白质的最迅速和有效的方法。尽管是机器学习(ML)应用程序中的热门选择,由于正样本和负样本的高质量训练集的可用性有限,因此不建议将深度学习(DL)方法用于基于序列特征的特定研究工作。然而,一些关于有限的数据可用性的DL工作也在最近执行,这将是我们未来的工作范围。因此,与DL方法相比,由于其优越的性能,因此在这项工作中使用了常规的ML技术。考虑到上述问题,这里提出了一种称为EPI-SF的技术,它使用ML来识别蛋白质-蛋白质相互作用网络(PPIN)中的必需蛋白质。蛋白质序列是蛋白质结构和功能的主要决定因素。所以,最初,从PPIN内的蛋白质中提取相关的蛋白质序列特征。这些特征随后被用作各种机器学习模型的输入,包括XGB增强分类器,AdaBoost分类器,逻辑回归(LR),支持向量分类(SVM),决策树模型(DT),随机森林模型(RF)和朴素贝叶斯模型(NB)。目的是检测PPIN内的必需蛋白。对酵母进行的初步调查检查了酵母PPIN的各种ML模型的性能。在这些模型中,射频模型技术的有效性最高,正如它的精确度所表明的,召回,F1分数,AUC值分别为0.703、0.720、0.711和0.745。与基于传统中心性的其他国家相比,也发现性能更好,例如中间性中心性(BC),接近中心性(CC),等。深度学习方法也像DeepEP,正如结果部分所强调的那样。由于其良好的性能,EPI-SF后来被用于预测人PPIN内部的新型必需蛋白。由于病毒倾向于选择性靶向参与人类PPIN内疾病传播的必需蛋白,进行调查以评估这些蛋白质可能参与COVID-19和其他相关严重疾病。
    Proteins are considered indispensable for facilitating an organism\'s viability, reproductive capabilities, and other fundamental physiological functions. Conventional biological assays are characterized by prolonged duration, extensive labor requirements, and financial expenses in order to identify essential proteins. Therefore, it is widely accepted that employing computational methods is the most expeditious and effective approach to successfully discerning essential proteins. Despite being a popular choice in machine learning (ML) applications, the deep learning (DL) method is not suggested for this specific research work based on sequence features due to the restricted availability of high-quality training sets of positive and negative samples. However, some DL works on limited availability of data are also executed at recent times which will be our future scope of work. Conventional ML techniques are thus utilized in this work due to their superior performance compared to DL methodologies. In consideration of the aforementioned, a technique called EPI-SF is proposed here, which employs ML to identify essential proteins within the protein-protein interaction network (PPIN). The protein sequence is the primary determinant of protein structure and function. So, initially, relevant protein sequence features are extracted from the proteins within the PPIN. These features are subsequently utilized as input for various machine learning models, including XGB Boost Classifier, AdaBoost Classifier, logistic regression (LR), support vector classification (SVM), Decision Tree model (DT), Random Forest model (RF), and Naïve Bayes model (NB). The objective is to detect the essential proteins within the PPIN. The primary investigation conducted on yeast examined the performance of various ML models for yeast PPIN. Among these models, the RF model technique had the highest level of effectiveness, as indicated by its precision, recall, F1-score, and AUC values of 0.703, 0.720, 0.711, and 0.745, respectively. It is also found to be better in performance when compared to the other state-of-arts based on traditional centrality like betweenness centrality (BC), closeness centrality (CC), etc. and deep learning methods as well like DeepEP, as emphasized in the result section. As a result of its favorable performance, EPI-SF is later employed for the prediction of novel essential proteins inside the human PPIN. Due to the tendency of viruses to selectively target essential proteins involved in the transmission of diseases within human PPIN, investigations are conducted to assess the probable involvement of these proteins in COVID-19 and other related severe diseases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    肺炎克雷伯菌是一种机会性多重耐药细菌病原体,可引起各种医疗保健相关感染。对细菌病原体存活所必需的蛋白质的预测可以极大地促进药物开发和朝向靶标鉴定的发现管道。为此,本研究报告了一种综合的计算方法,整合了生物信息学和基于系统生物学的方法,以鉴定参与生命过程的肺炎克雷伯菌必需蛋白。从这种病原体的蛋白质组中,我们根据序列预测了总共854种必需蛋白,蛋白质-蛋白质相互作用(PPI)和基因组尺度代谢模型方法。这些预测的必需蛋白参与细胞调节的重要过程,如翻译,新陈代谢,和基本因素的生物合成,在其他人中。PPI网络的聚类分析揭示了高度连接的模块涉及生物体的基本功能。Further,通过与现有资源(NetGenes和PATHOgenex)和文献进行比较,对肺炎克雷伯菌必需蛋白的预测共有集合进行了评估.这项研究的结果为理解细胞功能提供了指导,从而促进对病原体系统的理解,并为开发针对肺炎克雷伯菌的新型抗微生物剂的潜在治疗候选药物提供了一条前进的道路。此外,本文提出的研究策略是基于序列和系统生物学的方法的融合,该方法提供了作为预测其他病原体必需蛋白的模型的前景。
    Klebsiella pneumoniae is an opportunistic multidrug-resistant bacterial pathogen responsible for various health care-associated infections. The prediction of proteins that are essential for the survival of bacterial pathogens can greatly facilitate the drug development and discovery pipeline toward target identification. To this end, the present study reports a comprehensive computational approach integrating bioinformatics and systems biology-based methods to identify essential proteins of K. pneumoniae involved in vital processes. From the proteome of this pathogen, we predicted a total of 854 essential proteins based on sequence, protein-protein interaction (PPI) and genome-scale metabolic model methods. These predicted essential proteins are involved in vital processes for cellular regulation such as translation, metabolism, and biosynthesis of essential factors, among others. Cluster analysis of the PPI network revealed the highly connected modules involved in the basic functionality of the organism. Further, the predicted consensus set of essential proteins of K. pneumoniae was evaluated by comparing them with existing resources (NetGenes and PATHOgenex) and literature. The findings of this study offer guidance toward understanding cell functionality, thereby facilitating the understanding of pathogen systems and providing a way forward to shortlist potential therapeutic candidates for developing novel antimicrobial agents against K. pneumoniae. In addition, the research strategy presented herein is a fusion of sequence and systems biology-based approaches that offers prospects as a model to predict essential proteins for other pathogens.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:必需蛋白的鉴定在生物学和病理学中具有重要意义。然而,通过高通量技术获得的蛋白质-蛋白质相互作用(PPI)数据包括大量假阳性。为了克服这个限制,已经提出了许多基于生物学特征和拓扑特征的计算算法来识别必需蛋白质。
    结果:在本文中,我们提出了一种名为SESN的新方法来鉴定必需蛋白。它是一种基于PPI子网络和多种生物学特性的种子扩展方法。首先,SESN利用基因表达数据构建PPI子网络。其次,种子扩展在每个子网中同时执行,扩展过程基于预测的必需蛋白的拓扑特征。第三,纠错机制基于多个生物学特征和整个PPI网络。最后,SESN分析了每个生物学特性的影响,包括蛋白质复合物,基因表达数据,GO注释,和亚细胞定位,并采用实验结果最好的生物数据。SESN的输出是一组预测的必需蛋白。
    结论:对SESN的每个组件的分析表明所有组件的有效性。我们使用来自两个物种的三个数据集进行比较实验,实验结果表明,与其他方法相比,SESN具有优越的性能。
    BACKGROUND: The identification of essential proteins is of great significance in biology and pathology. However, protein-protein interaction (PPI) data obtained through high-throughput technology include a high number of false positives. To overcome this limitation, numerous computational algorithms based on biological characteristics and topological features have been proposed to identify essential proteins.
    RESULTS: In this paper, we propose a novel method named SESN for identifying essential proteins. It is a seed expansion method based on PPI sub-networks and multiple biological characteristics. Firstly, SESN utilizes gene expression data to construct PPI sub-networks. Secondly, seed expansion is performed simultaneously in each sub-network, and the expansion process is based on the topological features of predicted essential proteins. Thirdly, the error correction mechanism is based on multiple biological characteristics and the entire PPI network. Finally, SESN analyzes the impact of each biological characteristic, including protein complex, gene expression data, GO annotations, and subcellular localization, and adopts the biological data with the best experimental results. The output of SESN is a set of predicted essential proteins.
    CONCLUSIONS: The analysis of each component of SESN indicates the effectiveness of all components. We conduct comparison experiments using three datasets from two species, and the experimental results demonstrate that SESN achieves superior performance compared to other methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    肺炎支原体是社区获得性肺炎的重要病原体,引起上、下呼吸道急性炎症以及肺外综合征。特别是,老年人和婴儿患严重疾病的风险更大,肺炎支原体引起的危及生命的肺炎。然而,针对用于治疗肺炎支原体感染的抗生素的抗菌素耐药性的全球增加凸显了迫切需要探索新的药物靶标.为此,生物信息学方法,比如减法基因组学,可用于鉴定病原体特有的特定代谢途径和必需蛋白,它们可能是新药的潜在靶标。在这项研究中,我们实施了减法基因组学方法,以鉴定肺炎支原体特有的61条代谢途径和42种必需蛋白.随后在DrugBank数据库中进行的筛选发现了三种与FDA批准的小分子药物相似的可药用蛋白质,最后,化合物CHEBI:97093被确定为有希望的新型推定药物靶标。这些发现可以为开发选择性抑制病原体特异性代谢途径的高效药物提供重要见解。从而更好地管理和治疗肺炎支原体感染。
    Mycoplasma pneumoniae is a significant causative agent of community-acquired pneumonia, causing acute inflammation in the upper and lower respiratory tract as well as extrapulmonary syndromes. In particular, the elderly and infants are at greater risk of developing severe, life-threatening pneumonia caused by M. pneumoniae. Yet, the global increase in antimicrobial resistance against antibiotics for the treatment of M. pneumoniae infection highlights the urgent need to explore novel drug targets. To this end, bioinformatics approaches, such as subtractive genomics, can be employed to identify specific metabolic pathways and essential proteins unique to the pathogen that could be potential targets for new drugs. In this study, we implemented a subtractive genomics approach to identify 61 metabolic pathways and 42 essential proteins that are unique to M. pneumoniae. A subsequent screening in the DrugBank database revealed three druggable proteins with similarity to FDA-approved small-molecule drugs, and finally, the compound CHEBI:97093 was identified as a promising novel putative drug target. These findings can provide crucial insights for the development of highly effective drugs that selectively inhibit the pathogen-specific metabolic pathways, leading to better management and treatment of M. pneumoniae infections.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    必需蛋白质在细胞的发育和繁殖中起着至关重要的作用。鉴定必需蛋白质有助于了解细胞的基本存活。由于耗时,发现必需蛋白质的生物实验方法成本高昂且效率低下,计算方法越来越受到重视。在初始阶段,必需蛋白质主要通过基于蛋白质-蛋白质相互作用(PPI)网络的中心性进行识别,由于PPI网络中的许多误报,限制了它们的识别率。在这项研究中,首先引入纯化的PPI网络以减少PPI网络中误报的影响。其次,通过分析PPI网络中蛋白质与其邻居之间的相似性关系,提出了一种新的中心性,称为邻域相似性中心性(NSC)。第三,基于亚细胞定位和直系同源数据,计算蛋白质亚细胞定位得分和直系同源得分,分别。第四,通过分析大量基于多特征融合的方法,发现特征之间存在特殊的关系,这叫做支配关系,然后,提出了一种基于优势关系的模型。最后,NSC,亚细胞定位评分,和直向序列得分通过优势关系模型融合,并提出了一种称为NSO的新方法。为了验证NSO的性能,七种代表性方法(离子,NCCO,E_POC,儿子,JDC,PeC,WDC)在酵母数据集上进行比较。实验结果表明,NSO方法比其他方法具有更高的识别率。
    Essential proteins play a vital role in development and reproduction of cells. The identification of essential proteins helps to understand the basic survival of cells. Due to time-consuming, costly and inefficient with biological experimental methods for discovering essential proteins, computational methods have gained increasing attention. In the initial stage, essential proteins are mainly identified by the centralities based on protein-protein interaction (PPI) networks, which limit their identification rate due to many false positives in PPI networks. In this study, a purified PPI network is firstly introduced to reduce the impact of false positives in the PPI network. Secondly, by analyzing the similarity relationship between a protein and its neighbors in the PPI network, a new centrality called neighborhood similarity centrality (NSC) is proposed. Thirdly, based on the subcellular localization and orthologous data, the protein subcellular localization score and ortholog score are calculated, respectively. Fourthly, by analyzing a large number of methods based on multi-feature fusion, it is found that there is a special relationship among features, which is called dominance relationship, then, a novel model based on dominance relationship is proposed. Finally, NSC, subcellular localization score, and ortholog score are fused by the dominance relationship model, and a new method called NSO is proposed. In order to verify the performance of NSO, the seven representative methods (ION, NCCO, E_POC, SON, JDC, PeC, WDC) are compared on yeast datasets. The experimental results show that the NSO method has higher identification rate than other methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    肺炎链球菌是一种臭名昭著的革兰氏阳性病原体,无症状地存在于人类的鼻孔中。根据世界卫生组织(W.H.O)肺炎球菌每年导致约一百万人死亡。肺炎链球菌的抗生素耐药性在全世界引起了相当大的关注。迫切需要解决由于肺炎链球菌引起的持续感染而出现的主要问题。在本研究中,使用减法蛋白质组学,其中由1947种蛋白质组成的病原体的整个蛋白质组有效地减少到有限数量的可能靶标。各种生物信息学工具和软件被用于发现新型抑制剂。CD-HIT分析揭示了来自整个蛋白质组的1887个非冗余序列。将这些非冗余蛋白提交给针对人类蛋白质组的BLASTp,并筛选了1423种非同源蛋白。Further,必需基因数据库(DEGG)和J浏览器鉴定了近171种必需蛋白。此外,非同源,必需蛋白质在KEGGPathway数据库中进行了研究,该数据库入围了六种独特的蛋白质。此外,检查这些独特蛋白质的亚细胞定位,并选择细胞质蛋白质进行可药用性分析,产生了三种蛋白质,即DNA结合反应调节因子(SPD_1085),UDP-N-乙酰胞嘧啶-L-丙氨酸连接酶(SPD_1349)和RNA聚合酶σ因子(SPD_0958),它可以作为一种有前途的有效候选药物来限制肺炎链球菌引起的毒性。这些蛋白质的3D结构由瑞士模型预测,利用同源性建模方法。稍后,PyRx软件0.8版本的分子对接用于筛选从PubChem和ZINC数据库中检索到的植物化学物质库,以及已从DrugBank数据库中批准的针对新型可药用靶标的药物,以检查其与受体蛋白的结合亲和力。根据结合亲和力选择每种受体蛋白的前两个分子,RMSD值,和最高的构象。最后,吸收,分布,新陈代谢,排泄,和毒性(ADMET)分析是通过利用瑞士ADME和Protox工具进行的。这项研究支持发现具有成本效益的抗肺炎链球菌药物。然而,应该对这些靶标进行更多的体内/体外研究,以研究其药理功效及其作为有效抑制剂的功能。
    Streptococcus pneumoniae is a notorious Gram-positive pathogen present asymptomatically in the nasophayrnx of humans. According to the World Health Organization (W.H.O), pneumococcus causes approximately one million deaths yearly. Antibiotic resistance in S. pneumoniae is raising considerable concern around the world. There is an immediate need to address the major issues that have arisen as a result of persistent infections caused by S. pneumoniae. In the present study, subtractive proteomics was used in which the entire proteome of the pathogen consisting of 1947 proteins is effectively decreased to a finite number of possible targets. Various kinds of bioinformatics tools and software were applied for the discovery of novel inhibitors. The CD-HIT analysis revealed 1887 non-redundant sequences from the entire proteome. These non-redundant proteins were submitted to the BLASTp against the human proteome and 1423 proteins were screened as non-homologous. Further, databases of essential genes (DEGG) and J browser identified almost 171 essential proteins. Moreover, non-homologous, essential proteins were subjected in KEGG Pathway Database which shortlisted six unique proteins. In addition, the subcellular localization of these unique proteins was checked and cytoplasmic proteins were chosen for the druggability analysis, which resulted in three proteins, namely DNA binding response regulator (SPD_1085), UDP-N-acetylmuramate-L-alanine Ligase (SPD_1349) and RNA polymerase sigma factor (SPD_0958), which can act as a promising potent drug candidate to limit the toxicity caused by S. pneumoniae. The 3D structures of these proteins were predicted by Swiss Model, utilizing the homology modeling approach. Later, molecular docking by PyRx software 0.8 version was used to screen a library of phytochemicals retrieved from PubChem and ZINC databases and already approved drugs from DrugBank database against novel druggable targets to check their binding affinity with receptor proteins. The top two molecules from each receptor protein were selected based on the binding affinity, RMSD value, and the highest conformation. Finally, the absorption, distribution, metabolism, excretion, and toxicity (ADMET) analyses were carried out by utilizing the SWISS ADME and Protox tools. This research supported the discovery of cost-effective drugs against S. pneumoniae. However, more in vivo/in vitro research should be conducted on these targets to investigate their pharmacological efficacy and their function as efficient inhibitors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    淋病是一种紧迫的抗生素耐药性威胁,其治疗选择不断受到限制。此外,到目前为止,还没有批准针对它的疫苗。因此,本研究旨在针对耐抗生素淋病奈瑟菌菌株引入新的免疫原性和药物靶标。第一步,检索到了淋病奈瑟菌79个完整基因组的核心蛋白。接下来,表面暴露的蛋白质从不同方面进行评估,如抗原性,变应原性,水利,和B细胞和T细胞表位以引入有希望的免疫原性候选物。然后,与人Toll样受体(TLR-1、2和4)的相互作用,并模拟了引起体液和细胞免疫反应的免疫反应性。另一方面,为了识别新的广谱药物靶标,检测细胞质和必需蛋白。然后,将淋病奈瑟菌代谢组特异性蛋白与药物库的药物靶标进行了比较,并检索了新的药物靶标。最后,我们评估了蛋白质数据库(PDB)文件的可用性和ESKAPE组和常见性传播感染(STI)药物的患病率.我们的分析导致识别十个新的和推定的免疫原性靶标,包括murein转糖基酶A,PBP1A,奥帕,NlpD,Azurin,MtrE,RMPM,LptD,NspA,还有Tama.此外,确定了四个潜在的广谱药物靶标,包括UMP激酶,GlyQ,HU家族DNA结合蛋白,IF-1一些入围的免疫原性和药物靶标已确认在粘附中的作用,免疫逃避,和能诱导杀菌抗体的抗生素抗性。其他免疫原性和药物靶标也可能与淋病奈瑟菌的毒力有关。因此,建议进一步的实验研究和定点突变,以研究潜在的疫苗和药物靶点在淋病奈瑟菌发病机制中的作用.看来,提出新型疫苗和药物靶标的努力似乎为针对这种细菌的预防治疗策略铺平了道路。此外,杀菌单克隆抗体和抗生素的组合是治愈淋病奈瑟菌的一种有前途的方法。
    Gonorrhea is an urgent antimicrobial resistance threat and its therapeutic options are continuously getting restricted. Moreover, no vaccine has been approved against it so far. Hence, the present study aimed to introduce novel immunogenic and drug targets against antibiotic-resistant Neisseria gonorrhoeae strains. In the first step, the core proteins of 79 complete genomes of N. gonorrhoeae were retrieved. Next, the surface-exposed proteins were evaluated from different aspects such as antigenicity, allergenicity, conservancy, and B-cell and T-cell epitopes to introduce promising immunogenic candidates. Then, the interactions with human Toll-like receptors (TLR-1, 2, and 4), and immunoreactivity to elicit humoral and cellular immune responses were simulated. On the other hand, to identify novel broad-spectrum drug targets, the cytoplasmic and essential proteins were detected. Then, the N. gonorrhoeae metabolome-specific proteins were compared to the drug targets of the DrugBank, and novel drug targets were retrieved. Finally, the protein data bank (PDB) file availability and prevalence among the ESKAPE group and common sexually transmitted infection (STI) agents were assessed. Our analyses resulted in the recognition of ten novel and putative immunogenic targets including murein transglycosylase A, PBP1A, Opa, NlpD, Azurin, MtrE, RmpM, LptD, NspA, and TamA. Moreover, four potential and broad-spectrum drug targets were identified including UMP kinase, GlyQ, HU family DNA-binding protein, and IF-1. Some of the shortlisted immunogenic and drug targets have confirmed roles in adhesion, immune evasion, and antibiotic resistance that can induce bactericidal antibodies. Other immunogenic and drug targets might be associated with the virulence of N. gonorrhoeae as well. Thus, further experimental studies and site-directed mutations are recommended to investigate the role of potential vaccine and drug targets in the pathogenesis of N. gonorrhoeae. It seems that the efforts for proposing novel vaccines and drug targets appear to be paving the way for a prevention-treatment strategy against this bacterium. Additionally, a combination of bactericidal monoclonal antibodies and antibiotics is a promising approach to curing N. gonorrhoeae.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    已发现许多疾病相关基因与癌症诊断有关,这对理解癌症的病理生理学很有用,产生靶向药物,并开发新的诊断和治疗技术。随着泛癌症项目的发展和测序技术的不断扩展,许多科学家正专注于从癌症基因组图谱(TCGA)中挖掘各种癌症类型的常见基因。在这项研究中,我们试图通过同源匹配检查微生物模型生物酿酒酵母(酵母)来推断泛癌症相关基因,这是由于反向遗传学的好处。首先,建立了蛋白质-蛋白质相互作用的背景网络和涉及人类和酵母中几种癌症类型的致病基因集。然后通过同源性匹配发现人类基因和酵母基因之间的同源性,并得到其交互子网络。这是遵循共同祖先的同源基因在表达上可能具有相似性的原则进行的。然后,使用双向长短期记忆(BiLSTM)结合异构信息的自适应集成,我们进一步探索了酵母蛋白质相互作用网络的拓扑特征,并提出了一个节点表示分数来评估图中的节点能力。最后,人类基因的同源作图与酵母集成分类器鉴定的重要基因相匹配,这可能被认为是与所有类型癌症有关的基因。评估BiLSTM模型性能的一种方法是通过在数据库上进行实验。另一方面,富集分析,生存分析,和其他结果可用于确认预测结果的生物学重要性。您可以在https://github.com/zhuyuan-cug/AI-BiLSTM/tree/master访问整个实验协议和程序。
    Many disease-related genes have been found to be associated with cancer diagnosis, which is useful for understanding the pathophysiology of cancer, generating targeted drugs, and developing new diagnostic and treatment techniques. With the development of the pan-cancer project and the ongoing expansion of sequencing technology, many scientists are focusing on mining common genes from The Cancer Genome Atlas (TCGA) across various cancer types. In this study, we attempted to infer pan-cancer associated genes by examining the microbial model organism Saccharomyces Cerevisiae (Yeast) by homology matching, which was motivated by the benefits of reverse genetics. First, a background network of protein-protein interactions and a pathogenic gene set involving several cancer types in humans and yeast were created. The homology between the human gene and yeast gene was then discovered by homology matching, and its interaction sub-network was obtained. This was undertaken following the principle that the homologous genes of the common ancestor may have similarities in expression. Then, using bidirectional long short-term memory (BiLSTM) in combination with adaptive integration of heterogeneous information, we further explored the topological characteristics of the yeast protein interaction network and presented a node representation score to evaluate the node ability in graphs. Finally, homologous mapping for human genes matched the important genes identified by ensemble classifiers for yeast, which may be thought of as genes connected to all types of cancer. One way to assess the performance of the BiLSTM model is through experiments on the database. On the other hand, enrichment analysis, survival analysis, and other outcomes can be used to confirm the biological importance of the prediction results. You may access the whole experimental protocols and programs at https://github.com/zhuyuan-cug/AI-BiLSTM/tree/master.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:必需蛋白对于细胞的发育和存活是必不可少的。鉴定必需蛋白质不仅有助于理解细胞存活的最低要求,而且在疾病诊断中也有实际意义,药物设计和医疗。随着蛋白质-蛋白质相互作用(PPI)数据的快速积累,从蛋白质-蛋白质相互作用网络(PIN)中计算识别必需蛋白质变得越来越流行。到目前为止,已经开发了许多基于PIN鉴定必需蛋白质的方法。
    结果:在本文中,我们提出了一种称为iMEPP的新的有效方法,通过融合多种类型的生物学数据并将影响最大化机制应用于PIN来鉴定PIN中的必需蛋白。具体而言,我们首先整合PPI数据,基因表达数据和基因本体构建加权PIN,以缓解原始PPI数据中假阳性高的影响。然后,我们使用正交数据和PIN拓扑信息定义PIN中节点的影响得分。最后,我们开发了一种基于影响最大化机制的影响折扣算法来识别必需蛋白。
    结论:我们将我们的方法应用于鉴定酿酒酵母PIN的必需蛋白。实验表明,我们的iMEPP方法优于现有方法,验证了其有效性和优势。
    BACKGROUND: Essential proteins are indispensable to the development and survival of cells. The identification of essential proteins not only is helpful for the understanding of the minimal requirements for cell survival, but also has practical significance in disease diagnosis, drug design and medical treatment. With the rapidly amassing of protein-protein interaction (PPI) data, computationally identifying essential proteins from protein-protein interaction networks (PINs) becomes more and more popular. Up to now, a number of various approaches for essential protein identification based on PINs have been developed.
    RESULTS: In this paper, we propose a new and effective approach called iMEPP to identify essential proteins from PINs by fusing multiple types of biological data and applying the influence maximization mechanism to the PINs. Concretely, we first integrate PPI data, gene expression data and Gene Ontology to construct weighted PINs, to alleviate the impact of high false-positives in the raw PPI data. Then, we define the influence scores of nodes in PINs with both orthological data and PIN topological information. Finally, we develop an influence discount algorithm to identify essential proteins based on the influence maximization mechanism.
    CONCLUSIONS: We applied our method to identifying essential proteins from saccharomyces cerevisiae PIN. Experiments show that our iMEPP method outperforms the existing methods, which validates its effectiveness and advantage.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号