UniProt Uniprot-医云文献数字医云科研云海量医学决策数据服务

Uniprot 关注

UniProt

文献(31篇)

百科

视频

1 Are Most Human-Specific Proteins Encoded by Long Noncoding RNAs?

大多数人类特异性蛋白质由长非编码 RNA 编码吗？影响指数 : 3.973
发表时间：Aug 2024 25
来源期刊：J Mol Evol PMID：38916610

DOI：10.1007/s00239-024-10174-z
文章类型： Journal Article

通过在包含27个注释良好的灵长类动物蛋白质组和52个注释良好的其他哺乳动物蛋白质组的参考数据库中寻找缺乏同源物，鉴定了170种推定的人类特异性蛋白质。虽然他们中的大多数被认为是不确定的，2在蛋白质水平已知，23在转录水平已知，根据UniProt。有趣的是,发现这25种蛋白质中的23种在长的非编码人RNA的开放阅读框中被编码或具有紧密的同源物。然而,其中一半预计至少有80%是球状的，只有一个结构域，根据IUPred,至少有80%的有序残留物，根据flDPnn。引人注目的是，几乎完全缺乏关于这些蛋白质的结构知识，目前在蛋白质数据库中没有三级结构，并且在AlphaFold蛋白质结构数据库中对其中之一进行了合理的预测。此外，关于这些可能的关键蛋白质的功能的知识仍然很少。
By looking for a lack of homologs in a reference database of 27 well-annotated proteomes of primates and 52 well-annotated proteomes of other mammals, 170 putative human-specific proteins were identified. While most of them are deemed uncertain, 2 are known at the protein level and 23 at the transcript level, according to UniProt. Interestingly, 23 of these 25 proteins are found to be encoded or to have close homologs in an open reading frame of a long noncoding human RNA. However, half of them are predicted to be at least 80% globular, with a single structural domain, according to IUPred, and with at least 80% of ordered residues, according to flDPnn. Strikingly, there is a near-complete lack of structural knowledge about these proteins, with no tertiary structure presently available in the Protein Data Bank and a fair prediction for one of them in the AlphaFold Protein Structure Database. Moreover, knowledge about the function of these possibly key proteins remains scarce.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
2 In-silico characterization of GABAT protein found in gut-brain axis associated bacteria of healthy individuals and multiple sclerosis patients.

在健康个体和多发性硬化症患者的肠 - 脑轴相关细菌中发现的 GABAT 蛋白的计算机表征。影响指数 : 4.052
发表时间：Apr 2024
来源期刊：Saudi J Biol Sci PMID：38352114

DOI：10.1016/j.sjbs.2024.103939
文章类型： Journal Article

■多发性硬化症（MS）是一种神经退行性疾病，其特征在于神经元的炎症和脱髓鞘。有证据表明，水平的神经递质γ-氨基丁酸（GABA），由于γ-氨基丁酸转氨酶（GABAT）的降解，在MS患者大脑的某些区域减少。MS总是伴随着肠道细菌菌群失调。在健康的个体中，粪杆菌。而在MS患者A.calcoaceticus，梭菌属。和鼠伤寒沙门氏菌大量发现。尽管所有这些微生物都产生GABAT，但仅在MS患者中，该酶会显着降解GABA。
■本研究试图表征这些细菌的GABAT蛋白序列。
■从Uniprot数据库检索GABAT蛋白的序列。Protparam分析了序列,Gneg-mPLoc,SOSUI,PFP-FunDSeqE,皮轮计划，PROTEUS和Alphafold和SAVES服务器，MEME套件和HDOCK服务器。
■在健康个体的胃肠道（GIT）细菌中，GABAT蛋白以α螺旋含量（61％和62％）和β折叠含量（5％）存在于内膜中，4-螺旋细胞因子功能结构域。与MS患者GIT细菌酶相比，它具有更多的B细胞表位和更复杂的3D构型。
■本研究可能使我们能够通过定点诱变在病原菌中修饰GABAT编码基因和酶，从而降低其引起MS的潜力。
UNASSIGNED: Multiple sclerosis (MS) is a neurodegenerative disease characterized by inflammation and demyelination of neurons. There is evidence to suggest that level of a neurotransmitter gamma-aminobutyric acid (GABA), due to the degradation by γ-aminobutyric acid transaminase (GABAT), is reduced in certain areas of the brain in MS patients. MS is always accompanied by gut bacteria dysbiosis. In healthy individuals, Faecalibacterium sp. while in MS patients A. calcoaceticus, Clostridium sp. and S. typhimurium are found abundantly. Although all these microbes produce GABAT but only in MS patients this enzyme significantly degrades GABA.
UNASSIGNED: Present study is an attempt to characterize the GABAT protein sequences of these bacteria.
UNASSIGNED: Sequences of GABAT protein were retrieved from Uniprot database. Sequences were analyzed by Protparam, Gneg-mPLoc, SOSUI, PFP-FunDSeqE, Pepwheel program, PROTEUS and Alphafold and SAVES servers, MEME suite and HDOCK server.
UNASSIGNED: In healthy individuals gastrointestinal tract (GIT) bacteria, GABAT protein was present in inner-membrane with α helix content (61 and 62%) and β sheet content (5%), 4-helical cytokines functional domains. It has greater number of B-cell epitopes and more complex 3D configuration as compared to MS patients GIT bacterial enzymes.
UNASSIGNED: Present study might enable us to modify the GABAT encoding gene and enzyme through site-directed mutagenesis in pathogenic bacteria thus reducing their potential of causing MS.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
3 Perspectives of Proteomics in Respiratory Allergic Diseases.

蛋白质组学在呼吸系统过敏性疾病中的应用前景. 影响指数 : 6.208
发表时间：Aug 2023 18
来源期刊：Int J Mol Sci PMID：37629105

DOI：10.3390/ijms241612924
文章类型： Journal Article

呼吸道过敏性疾病的蛋白质组学拥有一系列的技术和程序，人们几乎会认为没有什么不可能找到的，发明或模具。我们在这里记录的所有资源都涉及解决过敏性疾病的问题，诊断和预后治疗，和免疫疗法的发展。主要观点,根据这个版本,在三条链和/或锁定免疫系统中：（1）阻断所涉及细胞的排骨，(2)修饰和阻断互补位和表位被理解为对抗体的修饰，对抗,或者挡住他们，和(3)阻断FcεRI高亲和力受体以防止特异性IgE粘附于肥大细胞和嗜碱性粒细胞。过敏环境中的这些工具和目标是，在我们看来,该领域的前景。然而,仍然有许多过敏原需要识别，包括过敏原和交叉反应之间的一些同源性，通过结构和表位的鉴定。目前使用蛋白质组学用于此目的的愿景仍然是不变的;这对于免疫疗法的诊断和控制系统的基础也是如此。我们的建议是使用这一愿景进行治疗。
Proteomics in respiratory allergic diseases has such a battery of techniques and programs that one would almost think there is nothing impossible to find, invent or mold. All the resources that we document here are involved in solving problems in allergic diseases, both diagnostic and prognostic treatment, and immunotherapy development. The main perspectives, according to this version, are in three strands and/or a lockout immunological system: (1) Blocking the diapedesis of the cells involved, (2) Modifications and blocking of paratopes and epitopes being understood by modifications to antibodies, antagonisms, or blocking them, and (3) Blocking FcεRI high-affinity receptors to prevent specific IgEs from sticking to mast cells and basophils. These tools and targets in the allergic landscape are, in our view, the prospects in the field. However, there are still many allergens to identify, including some homologies between allergens and cross-reactions, through the identification of structures and epitopes. The current vision of using proteomics for this purpose remains a constant; this is also true for the basis of diagnostic and controlled systems for immunotherapy. Ours is an open proposal to use this vision for treatment.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
4 UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship.

基于 UniProt 和质谱的蛋白质组学 - 双向工作关系。影响指数 : 暂无
发表时间：08 2023 8
来源期刊：Mol Cell Proteomics PMID：37301379

DOI：10.1016/j.mcpro.2023.100591
文章类型： Review

人蛋白质组包含由从人基因组翻译的序列产生的所有蛋白质,在序列和功能上具有由非同义变体和翻译后修饰引起的额外修饰,包括将初始转录物切割为更小的肽和多肽。UniProtKB数据库(www.uniprot.org)是世界领先的高质量，蛋白质序列和功能信息的全面和可自由获取的资源，并提出了实验验证的总结，或者计算预测，我们的专家生物冷冻团队为蛋白质组中的每种蛋白质添加的功能信息。基于质谱的蛋白质组学领域的研究人员既消耗也增加了UniProtKB中可用的数据，这篇综述强调了我们提供给这个社区的信息，以及我们通过在公共领域数据库中存储大规模数据集从群体中获得的知识。
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world\'s leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 UniProt Tools: BLAST, Align, Peptide Search, and ID Mapping.

UniProt 工具: BLAST,对齐,肽搜索，和 ID 映射。影响指数 : 暂无
发表时间：Mar 2023
来源期刊：Curr Protoc PMID：36943033

DOI：10.1002/cpz1.697
文章类型： Journal Article

通用蛋白质资源（UniProt）是蛋白质序列和注释数据的综合资源（UniProtConsortium，2023年）。UniProt网站每月接待约80万独立访问者，是访问UniProt的主要手段。除了您可以搜索的各种数据集之外，UniProt提供了四个主要工具。这些是用于序列相似性搜索的“BLAST”工具，用于多序列对齐的“对齐”工具，“肽搜索”工具，用于检索含有短肽序列的蛋白质，和“检索/ID映射”工具，用于使用标识符列表检索UniProt知识库(UniProtKB)蛋白质，并将数据库标识符从UniProt转换为外部数据库，反之亦然。本文提供了使用UniProt工具的四个基本协议和七个替代协议。©2023作者。WileyPeriodicalsLLC出版的当前协议。基本协议1：UniProt替代协议1中的基本本地对齐搜索工具（BLAST）：通过UniProt文本搜索结果页面BLAST替代协议2：通过UniProt篮子BLAST基本协议2：UniProtAlternate协议3：通过UniProt结果页面和条目页面对齐工具替代协议4：通过UniProt篮子对齐工具通过UnitPetrowet基本协议3中的工具
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data (UniProt Consortium, 2023). The UniProt website receives about 800,000 unique visitors per month and is the primary means to access UniProt. Along with various datasets that you can search, UniProt provides four main tools. These are the \"BLAST\" tool for sequence similarity searching, the \"Align\" tool for multiple sequence alignment, the \"Peptide Search\" tool for retrieving proteins containing a short peptide sequence, and the \"Retrieve/ID Mapping\" tool for using a list of identifiers to retrieve UniProt Knowledgebase (UniProtKB) proteins and to convert database identifiers from UniProt to external databases or vice versa. This article provides four basic protocols and seven alternate protocols for using UniProt tools. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Basic local alignment search tool (BLAST) in UniProt Alternate Protocol 1: BLAST through UniProt text search results pages Alternate Protocol 2: BLAST through UniProt basket Basic Protocol 2: Multiple sequence alignment in UniProt Alternate Protocol 3: Align tool through UniProt results pages and entry pages Alternate Protocol 4: Align tool through UniProt basket Basic Protocol 3: Peptide search in UniProt Basic Protocol 4: Batch retrieval and ID mapping in UniProt Alternate Protocol 5: Retrieve/ID Mapping tool through UniProt text search results pages and BLAST and Align results pages Alternate Protocol 6: Retrieve/ID Mapping tool through UniProt basket Alternate Protocol 7: Retrieve/ID Mapping tool through UniProt search box.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
6 Searching and Navigating UniProt Databases.

搜索和导航 UniProt 数据库。影响指数 : 暂无
发表时间：Mar 2023
来源期刊：Curr Protoc PMID：36912607

DOI：10.1002/cpz1.700
文章类型： Journal Article

通用蛋白质资源（UniProt）是蛋白质序列和注释数据的综合资源。UniProt网站每月接待约80万独立访问者，是访问UniProt的主要手段。它提供了10个可搜索的数据集和四个主要工具。关键的UniProt数据集是UniProt知识库(UniProtKB)，UniProt参考集群(UniRef)，UniProt档案馆(UniParc)和完全测序的基因组（蛋白质组）的蛋白质组。其他支持数据集包括有关UniProtKB蛋白质条目中存在的蛋白质的信息，如文献引文，分类法，和亚细胞位置，在其他人中。本文主要介绍如何使用UniProt数据集。第一个基本协议描述了UniProt数据集的导航和搜索机制，和两个附加协议建立在第一个协议上，以描述高级搜索和查询构建。©2023作者。WileyPeriodicalsLLC出版的当前协议。基本协议1：搜索UniProt数据集基本协议2：高级搜索和查询构建基础协议3：使用高级搜索添加参数。
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt website receives about 800,000 unique visitors per month and is the primary means to access UniProt. It provides 10 searchable datasets and four main tools. The key UniProt datasets are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), the UniProt Archive (UniParc), and protein sets for completely sequenced genomes (Proteomes). Other supporting datasets include information about proteins that is present in UniProtKB protein entries, such as literature citations, taxonomy, and subcellular locations, among others. This article focuses on how to use UniProt datasets. The first basic protocol describes navigation and searching mechanisms for the UniProt datasets, and two additional protocols build on the first protocol to describe advanced search and query building. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Searching UniProt datasets Basic Protocol 2: Advanced search and query building Basis Protocol 3: Adding parameters using advanced search.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
7 Redefining the catalytic HECT domain boundaries for the HECT E3 ubiquitin ligase family.

重新定义 HECT E3 泛素连接酶家族的催化 HECT 结构域边界。影响指数 : 3.976
发表时间：10 2022 28
来源期刊：Biosci Rep PMID：36111624

DOI：10.1042/BSR20221036
文章类型： Journal Article

与E6APC末端（HECT）E3泛素连接酶家族同源的人有28个独特的成员。HECTE3泛素连接酶的每个成员都含有在其C末端附近发现的约350个残基的保守双叶HECT结构域，该结构域负责其各自的泛素化活性。最近的研究已经开始阐明每种HECTE3泛素连接酶在各种癌症中的特定作用，年龄诱发的神经变性，和神经系统疾病。最近发布了一些HECTE3泛素连接酶的新结构模型，但是由于慢性不溶性和/或蛋白质折叠问题，许多HECT结构域结构尚未被检查。在这些最近发表的结构研究的基础上，再加上我们在本研究中讨论的内部实验，我们建议在HECT域的当前UniProt定义边界之前添加50个保守残基以分离可溶性，稳定,和活跃的HECT结构域。我们显示，使用计算机生物信息学分析和二级结构预测软件，在所有28个人HECTE3泛素连接酶中发现的这种预测的N末端α-螺旋形成专性两亲性α-螺旋，该螺旋与HECTN-末端叶。本研究提出了重新定义HECT域的残基边界的建议，以包括这种N末端延伸，这可能对未来的生化至关重要。结构,和HECTE3泛素连接酶家族的治疗研究。
There are 28 unique human members of the homologous to E6AP C-terminus (HECT) E3 ubiquitin ligase family. Each member of the HECT E3 ubiquitin ligases contains a conserved bilobal HECT domain of approximately 350 residues found near their C-termini that is responsible for their respective ubiquitylation activities. Recent studies have begun to elucidate specific roles that each HECT E3 ubiquitin ligase has in various cancers, age-induced neurodegeneration, and neurological disorders. New structural models have been recently released for some of the HECT E3 ubiquitin ligases, but many HECT domain structures have yet to be examined due to chronic insolubility and/or protein folding issues. Building on these recently published structural studies coupled with our in-house experiments discussed in the present study, we suggest that the addition of ∼50 conserved residues preceding the N-terminal to the current UniProt defined boundaries of the HECT domain are required for isolating soluble, stable, and active HECT domains. We show using in silico bioinformatic analyses coupled with secondary structural prediction software that this predicted N-terminal α-helix found in all 28 human HECT E3 ubiquitin ligases forms an obligate amphipathic α-helix that binds to a hydrophobic pocket found within the HECT N-terminal lobe. The present study brings forth the proposal to redefine the residue boundaries of the HECT domain to include this N-terminal extension that will likely be critical for future biochemical, structural, and therapeutic studies on the HECT E3 ubiquitin ligase family.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
8 Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation.

分析人类磷酸化蛋白质组以估计蛋白质磷酸化的真实程度。影响指数 : 5.37
发表时间：06 2022 3
来源期刊：J Proteome Res PMID：35532924

DOI：10.1021/acs.jproteome.2c00131
文章类型： Journal Article

诸如PhosphoSitePlus（PSP）和PeptideAtlas（PA）之类的公共磷酸化数据库从已发表的论文或公开可用的质谱（MS）数据中编译结果。然而,对站点的错误发现没有数据库级控制，可能导致对真正的磷酸位点的高估。通过分析人类磷酸化蛋白质组，我们估计了磷位点的错误发现率（FDR），并预测了更真实的真实识别计数。我们将位点分为磷酸化似然集，并根据100个物种的保守性进行分析，序列属性,和功能注释。我们证明了这些集合之间的显着差异，并开发了一种用于独立磷位点FDR估计的方法。值得注意的是,我们报告了磷酸丝氨酸（pSer）组中估计的FDRs为84％，98％和82％，磷酸苏氨酸(pThr)，和磷酸酪氨酸(pTyr)位点，分别,只有一个识别证据支持--PSP中的大多数网站。我们估计大约有62000Ser，8000Thr,人类蛋白质组中的12000个Tyr磷酸位点可能是真实的，低于大多数公布的估计。此外，我们的分析估计86000Ser，50000Thr,和26000Tyr磷位点可能是假阳性鉴定，强调了磷酸化数据库中存在的假阳性数据的巨大潜力。
Public phosphorylation databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available mass spectrometry (MS) data. However, there is no database-level control for false discovery of sites, likely leading to the overestimation of true phosphosites. By profiling the human phosphoproteome, we estimate the false discovery rate (FDR) of phosphosites and predict a more realistic count of true identifications. We rank sites into phosphorylation likelihood sets and analyze them in terms of conservation across 100 species, sequence properties, and functional annotations. We demonstrate significant differences between the sets and develop a method for independent phosphosite FDR estimation. Remarkably, we report estimated FDRs of 84, 98, and 82% within sets of phosphoserine (pSer), phosphothreonine (pThr), and phosphotyrosine (pTyr) sites, respectively, that are supported by only a single piece of identification evidence─the majority of sites in PSP. We estimate that around 62 000 Ser, 8000 Thr, and 12 000 Tyr phosphosites in the human proteome are likely to be true, which is lower than most published estimates. Furthermore, our analysis estimates that 86 000 Ser, 50 000 Thr, and 26 000 Tyr phosphosites are likely false-positive identifications, highlighting the significant potential of false-positive data to be present in phosphorylation databases.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
9 ProSight Annotator: Complete control and customization of protein entries in UniProt XML files.

ProSight Annotator ：在 UniProt XML 文件中完全控制和定制蛋白质条目。影响指数 : 5.393
发表时间：06 2022
来源期刊：Proteomics PMID：35286768

DOI：10.1002/pmic.202100209
文章类型： Journal Article

任何蛋白质组学数据库搜索的有效性取决于蛋白质数据库中包含的理论候选信息。不幸的是,来自蛋白质数据库（如UniProt）的候选条目很少包含所有翻译后修饰（PTM），二硫键,或研究人员感兴趣的内源性分裂。这些遗漏会限制新的和生物学上重要的蛋白质形式的发现。相反，对于大量修饰的蛋白质，寻找特定的蛋白质形式成为计算上困难的任务。这两种情况都需要通过用户注释的条目更新数据库。不幸的是,手动创建格式正确的UniProt可扩展标记语言(XML)文件既繁琐又容易出错。ProSightAnnotator通过提供一个图形界面来解决这些问题，该界面用于将用户定义的功能添加到UniProt格式的XML文件中，以实现更明智的proteofform搜索。它可以从http://prosightannotator下载。西北edu.
The effectiveness of any proteomics database search depends on the theoretical candidate information contained in the protein database. Unfortunately, candidate entries from protein databases such as UniProt rarely contain all the post-translational modifications (PTMs), disulfide bonds, or endogenous cleavages of interest to researchers. These omissions can limit discovery of novel and biologically important proteoforms. Conversely, searching for a specific proteoform becomes a computationally difficult task for heavily modified proteins. Both situations require updates to the database through user-annotated entries. Unfortunately, manually creating properly formatted UniProt Extensible Markup Language (XML) files is tedious and prone to errors. ProSight Annotator solves these issues by providing a graphical interface for adding user-defined features to UniProt-formatted XML files for better informed proteoform searches. It can be downloaded from http://prosightannotator.northwestern.edu.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds.

具有蛋白质和染色体结构网络的药物图谱的 IFPTML 图谱与用于发现抗疟化合物的临床前测定信息。影响指数 : 6.208
发表时间：Dec 2021 2
来源期刊：Int J Mol Sci PMID：34884870

DOI：10.3390/ijms222313066
文章类型： Journal Article

疟原虫属的寄生虫物种引起疟疾，这仍然是一个主要的全球健康问题，由于寄生虫对现有的抗疟药物的耐药性和增加的治疗成本。因此,疟原虫蛋白质组中具有新靶标的新抗疟化合物的计算预测。是制药行业非常重要的目标。我们可以预期，临床前试验的成功取决于试验本身的条件，药物的化学结构，目标蛋白的结构，以及控制该蛋白质在蛋白质组中表达的因素，例如基因（脱氧核糖核酸，DNA）序列和/或染色体结构。然而,没有同时考虑所有这些因素的计算模型的报告。这种分析的一些困难是数据在不同数据集中的分散，数据的高度异质性，等。在这项工作中,我们分析了三个数据库ChEMBL（欧洲分子生物学实验室的化学数据库），UniProt（通用蛋白质资源），和NCBI-GDV（国家生物技术信息中心-基因组数据查看器）来实现这一目标。ChEMBL数据集包含潜在抗疟化合物的17,758种独特测定的结果，包括化合物结构的数字描述符（变量）以及有关测定条件的大量信息。NCBI-GDV和UniProt数据集包括基因序列，蛋白质，以及它们的功能。此外,我们还从ChEMBL数据集创建了分类变量的两个分区（cassayj=caj和cdataj=cdj）。这些分区包含编码关于临床前测定的实验条件(caj)或关于数据的性质和质量(cdj)的信息的变量。这些分类变量包括有关22个生物活性参数(ca0)的信息，28个靶蛋白（ca1），和9种测定生物(CA2)，等。我们还创建了另一个分区（cprotj=cpj），包括具有有关靶蛋白生物学信息的分类变量，基因，和染色体。这些变量涵盖32个基因(cp0)，10条染色体（cp1），基因取向（cp2），和31蛋白质功能（cp3）。我们使用摄动理论机器学习信息融合（IFPTML）算法将所有这些信息（来自三个数据库）映射到并训练预测模型。Shannon的熵度量Shk（数值变量）用于量化有关药物结构的信息，蛋白质序列，基因序列,和相同信息尺度的染色体。具有移动平均（MA）算子形式的微扰理论算子（PTO）已用于量化结构变量中相对于分类变量的不同子集（分区）的期望值的扰动（偏差）。我们使用一般判别分析(GDA)获得了三个IFPTML模型，具有单变量拆分的分类树(CTUS)，和具有线性组合的分类树(CTLC)。IFPTML-CTLC表现出更好的性能，对于训练/验证集，灵敏度Sn(%)=83.6/85.1，特异性Sp(%)=89.8/89.7。分别。该模型可以成为一种有用的工具，用于优化新的抗疟药化合物的临床前测定疟原虫蛋白质组中的不同蛋白质。
The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information-Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (cassayj = caj and cdataj = cdj) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (caj) or about the nature and quality of data (cdj). These categorical variables include information about 22 parameters of biological activity (ca0), 28 target proteins (ca1), and 9 organisms of assay (ca2), etc. We also created another partition of (cprotj = cpj) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (cp0), 10 chromosomes (cp1), gene orientation (cp2), and 31 protein functions (cp3). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon\'s entropy measure Shk (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)

Uniprot 关注

1 Are Most Human-Specific Proteins Encoded by Long Noncoding RNAs?

2 In-silico characterization of GABAT protein found in gut-brain axis associated bacteria of healthy individuals and multiple sclerosis patients.

3 Perspectives of Proteomics in Respiratory Allergic Diseases.

4 UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship.

5 UniProt Tools: BLAST, Align, Peptide Search, and ID Mapping.

6 Searching and Navigating UniProt Databases.

7 Redefining the catalytic HECT domain boundaries for the HECT E3 ubiquitin ligase family.

8 Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation.

9 ProSight Annotator: Complete control and customization of protein entries in UniProt XML files.

10 IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds.