Essential proteins

  • 文章类型: Journal Article
    Accurately identifying essential proteins is vital for drug research and disease diagnosis. Traditional centrality methods and machine learning approaches often face challenges in accurately discerning essential proteins, primarily relying on information derived from protein-protein interaction (PPI) networks. Despite attempts by some researchers to integrate biological data and PPI networks for predicting essential proteins, designing effective integration methods remains a challenge. In response to these challenges, this paper presents the ACDMBI model, specifically designed to overcome the aforementioned issues. ACDMBI is comprised of two key modules: feature extraction and classification. In terms of capturing relevant information, we draw insights from three distinct data sources. Initially, structural features of proteins are extracted from the PPI network through community division. Subsequently, these features are further optimized using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Moving forward, protein features are extracted from gene expression data utilizing Bidirectional Long Short-Term Memory networks (BiLSTM) and a multi-head self-attention mechanism. Finally, protein features are derived by mapping subcellular localization data to a one-dimensional vector and processing it through fully connected layers. In the classification phase, we integrate features extracted from three different data sources, crafting a multi-layer deep neural network (DNN) for protein classification prediction. Experimental results on brewing yeast data showcase the ACDMBI model\'s superior performance, with AUC reaching 0.9533 and AUPR reaching 0.9153. Ablation experiments further reveal that the effective integration of features from diverse biological information significantly boosts the model\'s performance.






  • 文章类型: Journal Article
    Proteins are considered indispensable for facilitating an organism\'s viability, reproductive capabilities, and other fundamental physiological functions. Conventional biological assays are characterized by prolonged duration, extensive labor requirements, and financial expenses in order to identify essential proteins. Therefore, it is widely accepted that employing computational methods is the most expeditious and effective approach to successfully discerning essential proteins. Despite being a popular choice in machine learning (ML) applications, the deep learning (DL) method is not suggested for this specific research work based on sequence features due to the restricted availability of high-quality training sets of positive and negative samples. However, some DL works on limited availability of data are also executed at recent times which will be our future scope of work. Conventional ML techniques are thus utilized in this work due to their superior performance compared to DL methodologies. In consideration of the aforementioned, a technique called EPI-SF is proposed here, which employs ML to identify essential proteins within the protein-protein interaction network (PPIN). The protein sequence is the primary determinant of protein structure and function. So, initially, relevant protein sequence features are extracted from the proteins within the PPIN. These features are subsequently utilized as input for various machine learning models, including XGB Boost Classifier, AdaBoost Classifier, logistic regression (LR), support vector classification (SVM), Decision Tree model (DT), Random Forest model (RF), and Naïve Bayes model (NB). The objective is to detect the essential proteins within the PPIN. The primary investigation conducted on yeast examined the performance of various ML models for yeast PPIN. Among these models, the RF model technique had the highest level of effectiveness, as indicated by its precision, recall, F1-score, and AUC values of 0.703, 0.720, 0.711, and 0.745, respectively. It is also found to be better in performance when compared to the other state-of-arts based on traditional centrality like betweenness centrality (BC), closeness centrality (CC), etc. and deep learning methods as well like DeepEP, as emphasized in the result section. As a result of its favorable performance, EPI-SF is later employed for the prediction of novel essential proteins inside the human PPIN. Due to the tendency of viruses to selectively target essential proteins involved in the transmission of diseases within human PPIN, investigations are conducted to assess the probable involvement of these proteins in COVID-19 and other related severe diseases.






  • 文章类型: Journal Article
    Klebsiella pneumoniae is an opportunistic multidrug-resistant bacterial pathogen responsible for various health care-associated infections. The prediction of proteins that are essential for the survival of bacterial pathogens can greatly facilitate the drug development and discovery pipeline toward target identification. To this end, the present study reports a comprehensive computational approach integrating bioinformatics and systems biology-based methods to identify essential proteins of K. pneumoniae involved in vital processes. From the proteome of this pathogen, we predicted a total of 854 essential proteins based on sequence, protein-protein interaction (PPI) and genome-scale metabolic model methods. These predicted essential proteins are involved in vital processes for cellular regulation such as translation, metabolism, and biosynthesis of essential factors, among others. Cluster analysis of the PPI network revealed the highly connected modules involved in the basic functionality of the organism. Further, the predicted consensus set of essential proteins of K. pneumoniae was evaluated by comparing them with existing resources (NetGenes and PATHOgenex) and literature. The findings of this study offer guidance toward understanding cell functionality, thereby facilitating the understanding of pathogen systems and providing a way forward to shortlist potential therapeutic candidates for developing novel antimicrobial agents against K. pneumoniae. In addition, the research strategy presented herein is a fusion of sequence and systems biology-based approaches that offers prospects as a model to predict essential proteins for other pathogens.






  • 文章类型: Journal Article
    BACKGROUND: The identification of essential proteins is of great significance in biology and pathology. However, protein-protein interaction (PPI) data obtained through high-throughput technology include a high number of false positives. To overcome this limitation, numerous computational algorithms based on biological characteristics and topological features have been proposed to identify essential proteins.
    RESULTS: In this paper, we propose a novel method named SESN for identifying essential proteins. It is a seed expansion method based on PPI sub-networks and multiple biological characteristics. Firstly, SESN utilizes gene expression data to construct PPI sub-networks. Secondly, seed expansion is performed simultaneously in each sub-network, and the expansion process is based on the topological features of predicted essential proteins. Thirdly, the error correction mechanism is based on multiple biological characteristics and the entire PPI network. Finally, SESN analyzes the impact of each biological characteristic, including protein complex, gene expression data, GO annotations, and subcellular localization, and adopts the biological data with the best experimental results. The output of SESN is a set of predicted essential proteins.
    CONCLUSIONS: The analysis of each component of SESN indicates the effectiveness of all components. We conduct comparison experiments using three datasets from two species, and the experimental results demonstrate that SESN achieves superior performance compared to other methods.






  • 文章类型: Journal Article
    Mycoplasma pneumoniae is a significant causative agent of community-acquired pneumonia, causing acute inflammation in the upper and lower respiratory tract as well as extrapulmonary syndromes. In particular, the elderly and infants are at greater risk of developing severe, life-threatening pneumonia caused by M. pneumoniae. Yet, the global increase in antimicrobial resistance against antibiotics for the treatment of M. pneumoniae infection highlights the urgent need to explore novel drug targets. To this end, bioinformatics approaches, such as subtractive genomics, can be employed to identify specific metabolic pathways and essential proteins unique to the pathogen that could be potential targets for new drugs. In this study, we implemented a subtractive genomics approach to identify 61 metabolic pathways and 42 essential proteins that are unique to M. pneumoniae. A subsequent screening in the DrugBank database revealed three druggable proteins with similarity to FDA-approved small-molecule drugs, and finally, the compound CHEBI:97093 was identified as a promising novel putative drug target. These findings can provide crucial insights for the development of highly effective drugs that selectively inhibit the pathogen-specific metabolic pathways, leading to better management and treatment of M. pneumoniae infections.






  • 文章类型: Journal Article
    Essential proteins play a vital role in development and reproduction of cells. The identification of essential proteins helps to understand the basic survival of cells. Due to time-consuming, costly and inefficient with biological experimental methods for discovering essential proteins, computational methods have gained increasing attention. In the initial stage, essential proteins are mainly identified by the centralities based on protein-protein interaction (PPI) networks, which limit their identification rate due to many false positives in PPI networks. In this study, a purified PPI network is firstly introduced to reduce the impact of false positives in the PPI network. Secondly, by analyzing the similarity relationship between a protein and its neighbors in the PPI network, a new centrality called neighborhood similarity centrality (NSC) is proposed. Thirdly, based on the subcellular localization and orthologous data, the protein subcellular localization score and ortholog score are calculated, respectively. Fourthly, by analyzing a large number of methods based on multi-feature fusion, it is found that there is a special relationship among features, which is called dominance relationship, then, a novel model based on dominance relationship is proposed. Finally, NSC, subcellular localization score, and ortholog score are fused by the dominance relationship model, and a new method called NSO is proposed. In order to verify the performance of NSO, the seven representative methods (ION, NCCO, E_POC, SON, JDC, PeC, WDC) are compared on yeast datasets. The experimental results show that the NSO method has higher identification rate than other methods.






  • 文章类型: Journal Article
    Streptococcus pneumoniae is a notorious Gram-positive pathogen present asymptomatically in the nasophayrnx of humans. According to the World Health Organization (W.H.O), pneumococcus causes approximately one million deaths yearly. Antibiotic resistance in S. pneumoniae is raising considerable concern around the world. There is an immediate need to address the major issues that have arisen as a result of persistent infections caused by S. pneumoniae. In the present study, subtractive proteomics was used in which the entire proteome of the pathogen consisting of 1947 proteins is effectively decreased to a finite number of possible targets. Various kinds of bioinformatics tools and software were applied for the discovery of novel inhibitors. The CD-HIT analysis revealed 1887 non-redundant sequences from the entire proteome. These non-redundant proteins were submitted to the BLASTp against the human proteome and 1423 proteins were screened as non-homologous. Further, databases of essential genes (DEGG) and J browser identified almost 171 essential proteins. Moreover, non-homologous, essential proteins were subjected in KEGG Pathway Database which shortlisted six unique proteins. In addition, the subcellular localization of these unique proteins was checked and cytoplasmic proteins were chosen for the druggability analysis, which resulted in three proteins, namely DNA binding response regulator (SPD_1085), UDP-N-acetylmuramate-L-alanine Ligase (SPD_1349) and RNA polymerase sigma factor (SPD_0958), which can act as a promising potent drug candidate to limit the toxicity caused by S. pneumoniae. The 3D structures of these proteins were predicted by Swiss Model, utilizing the homology modeling approach. Later, molecular docking by PyRx software 0.8 version was used to screen a library of phytochemicals retrieved from PubChem and ZINC databases and already approved drugs from DrugBank database against novel druggable targets to check their binding affinity with receptor proteins. The top two molecules from each receptor protein were selected based on the binding affinity, RMSD value, and the highest conformation. Finally, the absorption, distribution, metabolism, excretion, and toxicity (ADMET) analyses were carried out by utilizing the SWISS ADME and Protox tools. This research supported the discovery of cost-effective drugs against S. pneumoniae. However, more in vivo/in vitro research should be conducted on these targets to investigate their pharmacological efficacy and their function as efficient inhibitors.






  • 文章类型: Journal Article
    Gonorrhea is an urgent antimicrobial resistance threat and its therapeutic options are continuously getting restricted. Moreover, no vaccine has been approved against it so far. Hence, the present study aimed to introduce novel immunogenic and drug targets against antibiotic-resistant Neisseria gonorrhoeae strains. In the first step, the core proteins of 79 complete genomes of N. gonorrhoeae were retrieved. Next, the surface-exposed proteins were evaluated from different aspects such as antigenicity, allergenicity, conservancy, and B-cell and T-cell epitopes to introduce promising immunogenic candidates. Then, the interactions with human Toll-like receptors (TLR-1, 2, and 4), and immunoreactivity to elicit humoral and cellular immune responses were simulated. On the other hand, to identify novel broad-spectrum drug targets, the cytoplasmic and essential proteins were detected. Then, the N. gonorrhoeae metabolome-specific proteins were compared to the drug targets of the DrugBank, and novel drug targets were retrieved. Finally, the protein data bank (PDB) file availability and prevalence among the ESKAPE group and common sexually transmitted infection (STI) agents were assessed. Our analyses resulted in the recognition of ten novel and putative immunogenic targets including murein transglycosylase A, PBP1A, Opa, NlpD, Azurin, MtrE, RmpM, LptD, NspA, and TamA. Moreover, four potential and broad-spectrum drug targets were identified including UMP kinase, GlyQ, HU family DNA-binding protein, and IF-1. Some of the shortlisted immunogenic and drug targets have confirmed roles in adhesion, immune evasion, and antibiotic resistance that can induce bactericidal antibodies. Other immunogenic and drug targets might be associated with the virulence of N. gonorrhoeae as well. Thus, further experimental studies and site-directed mutations are recommended to investigate the role of potential vaccine and drug targets in the pathogenesis of N. gonorrhoeae. It seems that the efforts for proposing novel vaccines and drug targets appear to be paving the way for a prevention-treatment strategy against this bacterium. Additionally, a combination of bactericidal monoclonal antibodies and antibiotics is a promising approach to curing N. gonorrhoeae.






  • 文章类型: Journal Article
    Many disease-related genes have been found to be associated with cancer diagnosis, which is useful for understanding the pathophysiology of cancer, generating targeted drugs, and developing new diagnostic and treatment techniques. With the development of the pan-cancer project and the ongoing expansion of sequencing technology, many scientists are focusing on mining common genes from The Cancer Genome Atlas (TCGA) across various cancer types. In this study, we attempted to infer pan-cancer associated genes by examining the microbial model organism Saccharomyces Cerevisiae (Yeast) by homology matching, which was motivated by the benefits of reverse genetics. First, a background network of protein-protein interactions and a pathogenic gene set involving several cancer types in humans and yeast were created. The homology between the human gene and yeast gene was then discovered by homology matching, and its interaction sub-network was obtained. This was undertaken following the principle that the homologous genes of the common ancestor may have similarities in expression. Then, using bidirectional long short-term memory (BiLSTM) in combination with adaptive integration of heterogeneous information, we further explored the topological characteristics of the yeast protein interaction network and presented a node representation score to evaluate the node ability in graphs. Finally, homologous mapping for human genes matched the important genes identified by ensemble classifiers for yeast, which may be thought of as genes connected to all types of cancer. One way to assess the performance of the BiLSTM model is through experiments on the database. On the other hand, enrichment analysis, survival analysis, and other outcomes can be used to confirm the biological importance of the prediction results. You may access the whole experimental protocols and programs at






  • 文章类型: Journal Article
    BACKGROUND: Essential proteins are indispensable to the development and survival of cells. The identification of essential proteins not only is helpful for the understanding of the minimal requirements for cell survival, but also has practical significance in disease diagnosis, drug design and medical treatment. With the rapidly amassing of protein-protein interaction (PPI) data, computationally identifying essential proteins from protein-protein interaction networks (PINs) becomes more and more popular. Up to now, a number of various approaches for essential protein identification based on PINs have been developed.
    RESULTS: In this paper, we propose a new and effective approach called iMEPP to identify essential proteins from PINs by fusing multiple types of biological data and applying the influence maximization mechanism to the PINs. Concretely, we first integrate PPI data, gene expression data and Gene Ontology to construct weighted PINs, to alleviate the impact of high false-positives in the raw PPI data. Then, we define the influence scores of nodes in PINs with both orthological data and PIN topological information. Finally, we develop an influence discount algorithm to identify essential proteins based on the influence maximization mechanism.
    CONCLUSIONS: We applied our method to identifying essential proteins from saccharomyces cerevisiae PIN. Experiments show that our iMEPP method outperforms the existing methods, which validates its effectiveness and advantage.





