ChEMBL

CHEMBL
  • 文章类型: Journal Article
    膜通透性是代表化合物的表观通透性(Papp)的体外参数,是一个关键的吸收,分布,新陈代谢,药物开发中的排泄参数。尽管Caco-2细胞系是测量Papp最常用的细胞系,其他细胞系,例如Madin-Darby犬肾(MDCK),LLC-猪肾1(LLC-PK1),和RalphRuss犬肾(RRCK)细胞系,也可以用来估计Papp。因此,使用MDCK构建Papp估计的仿真模型,LLC-PK1和RRCK细胞系需要收集大量的体外Papp数据。一个开放的数据库提供了各种化合物的广泛测量,涵盖了广阔的化学空间;然而,在没有进行适当的准确性和质量检查的情况下,报告了对使用公开数据库中发布的数据的担忧。确保用于计算机模型训练的数据集的质量至关重要,因为人工智能(AI,包括深度学习)用于开发模型来预测各种药代动力学特性,和数据质量影响这些模型的性能。因此,必须对收集到的数据进行仔细的管理。在这里,我们开发了一个新的工作流程,支持MDCK中测量的Papp数据的自动管理,使用KNIME从ChEMBL收集的LLC-PK1和RRCK细胞系。工作流程由四个主要阶段组成。从ChEMBL中提取数据并过滤以鉴定目标方案。在检查436篇文章后,总共保留了1661个高质量条目。工作流程免费提供,可以更新,并具有较高的可重用性。我们的研究为数据质量分析提供了一种新颖的方法,并加速了有效药物发现的有用计算机模型的开发。科学贡献:通过自动收集可靠的测量数据,可以显着降低构建高度准确的预测模型的成本。我们的工具减少了数据收集所需的时间和精力,并使研究人员能够专注于为其他类型的分析构建高性能的计算机模型。据我们所知,文献中没有这样的工具。
    Membrane permeability is an in vitro parameter that represents the apparent permeability (Papp) of a compound, and is a key absorption, distribution, metabolism, and excretion parameter in drug development. Although the Caco-2 cell lines are the most used cell lines to measure Papp, other cell lines, such as the Madin-Darby Canine Kidney (MDCK), LLC-Pig Kidney 1 (LLC-PK1), and Ralph Russ Canine Kidney (RRCK) cell lines, can also be used to estimate Papp. Therefore, constructing in silico models for Papp estimation using the MDCK, LLC-PK1, and RRCK cell lines requires collecting extensive amounts of in vitro Papp data. An open database offers extensive measurements of various compounds covering a vast chemical space; however, concerns were reported on the use of data published in open databases without the appropriate accuracy and quality checks. Ensuring the quality of datasets for training in silico models is critical because artificial intelligence (AI, including deep learning) was used to develop models to predict various pharmacokinetic properties, and data quality affects the performance of these models. Hence, careful curation of the collected data is imperative. Herein, we developed a new workflow that supports automatic curation of Papp data measured in the MDCK, LLC-PK1, and RRCK cell lines collected from ChEMBL using KNIME. The workflow consisted of four main phases. Data were extracted from ChEMBL and filtered to identify the target protocols. A total of 1661 high-quality entries were retained after checking 436 articles. The workflow is freely available, can be updated, and has high reusability. Our study provides a novel approach for data quality analysis and accelerates the development of helpful in silico models for effective drug discovery. Scientific Contribution: The cost of building highly accurate predictive models can be significantly reduced by automating the collection of reliable measurement data. Our tool reduces the time and effort required for data collection and will enable researchers to focus on constructing high-performance in silico models for other types of analysis. To the best of our knowledge, no such tool is available in the literature.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们报告了拉丁美洲化学信息学学院的主要亮点,墨西哥城,2022年11月24-25日。六个讲座,一个车间,在一次在线公开活动中,有四位编辑参加了一次圆桌会议,来自学术界的演讲者,大型制药公司,和公共研究机构。来自79个国家的1,000名学生和学者报名参加了会议。作为会议的一部分,化学空间的列举和可视化方面的进步,在基于天然产品的药物发现中的应用,药物发现被忽视的疾病,毒性预测,并讨论了数据分析的一般指南。ChEMBL的专家介绍了如何使用化学信息学中使用的这个主要化合物数据库的资源的研讨会。学校还包括与化学信息学期刊编辑的圆桌会议。会议的完整程序和会议记录可在https://www上公开获得。youtube.com/@SchoolChemInfLA/精选。
    We report the major highlights of the School of Cheminformatics in Latin America, Mexico City, November 24-25, 2022. Six lectures, one workshop, and one roundtable with four editors were presented during an online public event with speakers from academia, big pharma, and public research institutions. One thousand one hundred eighty-one students and academics from seventy-nine countries registered for the meeting. As part of the meeting, advances in enumeration and visualization of chemical space, applications in natural product-based drug discovery, drug discovery for neglected diseases, toxicity prediction, and general guidelines for data analysis were discussed. Experts from ChEMBL presented a workshop on how to use the resources of this major compounds database used in cheminformatics. The school also included a round table with editors of cheminformatics journals. The full program of the meeting and the recordings of the sessions are publicly available at https://www.youtube.com/@SchoolChemInfLA/featured .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    包含数百万个小分子的化合物生物活性数据的公共存储库为化学基因组化合物候选物搜索提供了宝贵的资源。尽管如此,由于数据挖掘不均匀,这些数据库通常是不完整的,因此主张结合使用来自多个存储库的数据,以提高目标覆盖率和数据准确性。这里,我们提出了一个工作流程,从公共数据库中生成自定义数据集,用于挖掘化学基因组化合物候选物。汇编的集合提供了结构和生物活性数据差异的标志,并能够快速提取有效和选择性的生物活性化合物。
    Public repositories containing compound-bioactivity data for millions of small molecules offer a valuable resource for chemogenomic compound candidate search. Nonetheless, owning to nonuniform data mining, these databases are often incomplete, thus advocating the combined use of data from several repositories to increase target coverage and data accuracy. Here, we present a workflow to generate custom datasets from public databases for mining chemogenomic compound candidates. The compiled set provides flags for differences in structural and bioactivity data and enables rapid extraction of potent and selective bioactive compounds.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    血吸虫病是最重要的被忽视的热带病之一。在有效疫苗注册使用之前,血吸虫病控制的基石仍然是吡喹酮的化疗。由于吡喹酮不敏感/耐药血吸虫发展的可能性,该策略的可持续性面临巨大风险。如果有功能基因组学,在血吸虫药物发现管道中可以节省大量的时间和精力,生物信息学,系统地利用化学信息学和表型资源。我们的方法,在这里描述,概述了血吸虫特有的资源/方法,加上开放获取的药物发现数据库ChEMBL,可以合作用于加速早期阶段,血吸虫药物发现的努力。我们的过程确定了七个化合物(fimepinostat,曲古抑菌素A,NVP-BEP800,luminespib,环氧霉素,CGP60474和星形孢菌素)具有在亚微摩尔范围内的离体抗血吸虫效力。其中三种化合物(环氧霉素,CGP60474和星形孢菌素)还证明了对成年血吸虫的有效和快速作用的离体作用,并完全抑制了产卵。还利用ChEMBL毒性数据为将CGP60474(以及luminespib和TAE684)发展为新型抗血吸虫化合物提供了进一步的支持。由于目前很少有化合物处于抗血吸虫管道的高级阶段,我们的方法强调了一种策略,通过该策略可以识别新的化学物质,并在临床前开发中快速进展.
    Schistosomiasis is one of the most important neglected tropical diseases. Until an effective vaccine is registered for use, the cornerstone of schistosomiasis control remains chemotherapy with praziquantel. The sustainability of this strategy is at substantial risk due to the possibility of praziquantel insensitive/resistant schistosomes developing. Considerable time and effort could be saved in the schistosome drug discovery pipeline if available functional genomics, bioinformatics, cheminformatics and phenotypic resources are systematically leveraged. Our approach, described here, outlines how schistosome-specific resources/methodologies, coupled to the open-access drug discovery database ChEMBL, can be cooperatively used to accelerate early-stage, schistosome drug discovery efforts. Our process identified seven compounds (fimepinostat, trichostatin A, NVP-BEP800, luminespib, epoxomicin, CGP60474 and staurosporine) with ex vivo anti-schistosomula potencies in the sub-micromolar range. Three of those compounds (epoxomicin, CGP60474 and staurosporine) also demonstrated potent and fast-acting ex vivo effects on adult schistosomes and completely inhibited egg production. ChEMBL toxicity data were also leveraged to provide further support for progressing CGP60474 (as well as luminespib and TAE684) as a novel anti-schistosomal compound. As very few compounds are currently at the advanced stages of the anti-schistosomal pipeline, our approaches highlight a strategy by which new chemical matter can be identified and quickly progressed through preclinical development.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    糖尿病是导致一组代谢性疾病的慢性高血糖症。这种慢性高血糖的情况是由胰岛素水平异常引起的。高血糖对人类血管树的影响是1型和2型糖尿病的主要疾病和死亡原因。2型糖尿病(T2DM)患者分泌异常以及胰岛素的作用。2型(非胰岛素依赖型)糖尿病是由与胰岛素产生减少相关的遗传因素的组合引起的。胰岛素抵抗,和环境条件。这些条件包括暴饮暴食,缺乏锻炼,肥胖,和衰老。葡萄糖转运限制了脂肪和肌肉使用的饮食葡萄糖的速率。葡萄糖转运蛋白GLUT4保持细胞内并动态分选,和GLUT4易位或胰岛素调节的囊泡交通将其分布到质膜。不同的化合物具有抗糖尿病特性。复杂性,新陈代谢,消化,和这些化合物的相互作用使得难以理解和应用它们来减少慢性炎症,从而预防慢性疾病。在这项研究中,我们采用了虚拟筛选方法,筛选出最适合和可用于药物治疗的化合物,以作为抗T2DM的潜在药物靶点.我们发现,在我们分析的5000种化合物中,根据我们的基于分子对接研究和通过Lipinski规则和ADMET特性的虚拟筛选的实验,只有两个被认为更有效。
    Diabetes is a chronic hyperglycemic disorder that leads to a group of metabolic diseases. This condition of chronic hyperglycemia is caused by abnormal insulin levels. The impact of hyperglycemia on the human vascular tree is the leading cause of disease and death in type 1 and type 2 diabetes. People with type 2 diabetes mellitus (T2DM) have abnormal secretion as well as the action of insulin. Type 2 (non-insulin-dependent) diabetes is caused by a combination of genetic factors associated with decreased insulin production, insulin resistance, and environmental conditions. These conditions include overeating, lack of exercise, obesity, and aging. Glucose transport limits the rate of dietary glucose used by fat and muscle. The glucose transporter GLUT4 is kept intracellular and sorted dynamically, and GLUT4 translocation or insulin-regulated vesicular traffic distributes it to the plasma membrane. Different chemical compounds have antidiabetic properties. The complexity, metabolism, digestion, and interaction of these chemical compounds make it difficult to understand and apply them to reduce chronic inflammation and thus prevent chronic disease. In this study, we have applied a virtual screening approach to screen the most suitable and drug-able chemical compounds to be used as potential drug targets against T2DM. We have found that out of 5000 chemical compounds that we have analyzed, only two are known to be more effective as per our experiments based upon molecular docking studies and virtual screening through Lipinski\'s rule and ADMET properties.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目前,G蛋白偶联受体(GPCRs)构成代表超过30%的治疗靶标的膜结合受体的重要组。氟通常用于设计高活性生物化合物,食品和药物管理局(FDA)稳步增加的药物数量证明了这一点。在这里,我们在ChEMBL数据库-FiSAR组鉴定并分析了898个基于靶标的含F异构模拟组,用于SAR分析,这些模拟组对33种不同的胺能GPCRs具有活性,包括总共2163种氟化(1201种独特)化合物.我们发现30个FiSAR集包含活动悬崖(AC),定义为结构相似的化合物对,显示出亲和力的显着差异(≥50倍变化),其中氟位置的变化可能导致效力的1300倍变化。对匹配分子对(MMP)网络的分析表明,芳环的氟化对亲和力没有明显的正面或负面影响。此外,我们提出了一个计算机工作流程(包括诱导对接,分子动力学,量子极化配体对接,和基于广义玻恩表面积(GBSA)模型的结合自由能计算),以对分子中的氟位置进行评分。
    Currently, G protein-coupled receptors (GPCRs) constitute a significant group of membrane-bound receptors representing more than 30% of therapeutic targets. Fluorine is commonly used in designing highly active biological compounds, as evidenced by the steadily increasing number of drugs by the Food and Drug Administration (FDA). Herein, we identified and analyzed 898 target-based F-containing isomeric analog sets for SAR analysis in the ChEMBL database-FiSAR sets active against 33 different aminergic GPCRs comprising a total of 2163 fluorinated (1201 unique) compounds. We found 30 FiSAR sets contain activity cliffs (ACs), defined as pairs of structurally similar compounds showing significant differences in affinity (≥50-fold change), where the change of fluorine position may lead up to a 1300-fold change in potency. The analysis of matched molecular pair (MMP) networks indicated that the fluorination of aromatic rings showed no clear trend toward a positive or negative effect on affinity. Additionally, we propose an in silico workflow (including induced-fit docking, molecular dynamics, quantum polarized ligand docking, and binding free energy calculations based on the Generalized-Born Surface-Area (GBSA) model) to score the fluorine positions in the molecule.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在癌症治疗中,小分子被用于抑制细胞周期蛋白依赖性激酶(CDK)酶。有证据表明,CDK是许多肿瘤类型癌症治疗的药物靶标,因为它催化ATP的末端磷酸盐转移到充当底物的蛋白质上。在这里,试图鉴定CDK抑制剂吡喃吡唑,其合成是由纳米二氧化锆通过多组分反应催化的。此外,我们对多组分反应的中间体进行了原位分析,第一次,这表明纳米二氧化锆刺激了反应,根据吉布斯自由能自发性计算估计。功能上,使用人乳腺癌细胞(MCF-7)测试了新型吡喃吡唑的细胞活力丧失。观察到化合物5b和5f有效地产生MCF-7细胞的生存力损失,IC50值为17.83和23.79μM。分别。体外和计算机作用模式研究表明吡喃吡唑靶向人乳腺癌细胞中的CDK1,先导化合物5b和5f的有效IC50值为960nM和7.16μM,分别。因此,新合成的生物活性吡喃吡唑可以作为更好的结构来开发针对人乳腺癌细胞的CDK1抑制剂。
    Small molecules are being used to inhibit cyclin dependent kinase (CDK) enzymes in cancer treatment. There is evidence that CDK is a drug-target for cancer therapy across many tumor types because it catalyzes the transfer of the terminal phosphate of ATP to a protein that acts as a substrate. Herein, the identification of pyranopyrazoles that were CDK inhibitors was attempted, whose synthesis was catalyzed by nano-zirconium dioxide via multicomponent reaction. Additionally, we performed an in-situ analysis of the intermediates of multicomponent reactions, for the first-time, which revealed that nano-zirconium dioxide stimulated the reaction, as estimated by Gibbs free energy calculations of spontaneity. Functionally, the novel pyranopyrazoles were tested for a loss of cell viability using human breast cancer cells (MCF-7). It was observed that compounds 5b and 5f effectively produced loss of viability of MCF-7 cells with IC50 values of 17.83 and 23.79 µM, respectively. In vitro and in silico mode-of-action studies showed that pyranopyrazoles target CDK1 in human breast cancer cells, with lead compounds 5b and 5f having potent IC50 values of 960 nM and 7.16 μM, respectively. Hence, the newly synthesized bioactive pyranopyrazoles could serve as better structures to develop CDK1 inhibitors against human breast cancer cells.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了在筛选和生物学相关化合物的背景下分析法国国家化合物图书馆(CN),将文库与ZINC库存收集和ChEMBL进行比较。这包括化学空间覆盖的研究,物理化学性质和Bemis-Murcko(BM)支架种群。鉴定了超过5K的CN-独特支架(与ZINC和ChEMBL集合相比)。生成容纳这些库的生成地形图(GTM),并用于比较化合物种群。应用分层GTM(“缩放”)来生成各种分辨率级别的地图集合,从全球概览到精确绘制单个结构。各自的地图已添加到ChemSpaceAtlas网站。在组合化学背景下的合成可及性分析显示,使用可商购的结构单元可完全合成只有29.7%的CN化合物。
    In order to analyze the Chimiothèque Nationale (CN) - The French National Compound Library - in the context of screening and biologically relevant compounds, the library was compared with ZINC in-stock collection and ChEMBL. This includes the study of chemical space coverage, physicochemical properties and Bemis-Murcko (BM) scaffold populations. More than 5 K CN-unique scaffolds (relative to ZINC and ChEMBL collections) were identified. Generative Topographic Maps (GTMs) accommodating those libraries were generated and used to compare the compound populations. Hierarchical GTM («zooming») was applied to generate an ensemble of maps at various resolution levels, from global overview to precise mapping of individual structures. The respective maps were added to the ChemSpace Atlas website. The analysis of synthetic accessibility in the context of combinatorial chemistry showed that only 29,7 % of CN compounds can be fully synthesized using commercially available building blocks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    抗菌药物(AD)改变细菌的代谢状态,导致细菌死亡。然而,抗生素耐药性和多药耐药细菌的出现增加了人们对理解代谢网络(MN)突变和AD与MN相互作用的兴趣。在这项研究中,我们使用IFPTML=信息融合(IF)+微扰理论(PT)+机器学习(ML)算法在一个巨大的数据集从ChEMBL数据库,其中包含>155,000个AD测定与>40多个细菌物种的MN。我们建立了线性判别分析(LDA)和17ML模型,该模型以线性指数为中心并基于原子来预测抗菌化合物。IFPTML-LDA模型为训练子集提供了以下结果:特异性(Sp)=70,000例中的76%,灵敏度(Sn)=70%,准确度(Acc)=73%。对于验证子集,同一模型还给出了以下结果:Sp=76%,Sn=70%,Acc=73.1%。在IFPTML非线性模型中,k个最近邻(KNN)表现出最好的结果,Sn=99.2%,Sp=95.5%,Acc=97.4%,在训练集中,接收机工作特性下的面积(AUROC)=0.998。在验证系列中,随机森林效果最好:Sn=93.96%,Sp=87.02%(AUROC=0.945)。关于AD与MN的IFPTML线性和非线性模型具有良好的统计参数,它们可能有助于在抗生素耐药性中发现新的代谢突变,并减少抗菌药物研究的时间/成本。
    Antibacterial drugs (AD) change the metabolic status of bacteria, contributing to bacterial death. However, antibiotic resistance and the emergence of multidrug-resistant bacteria increase interest in understanding metabolic network (MN) mutations and the interaction of AD vs MN. In this study, we employed the IFPTML = Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) algorithm on a huge dataset from the ChEMBL database, which contains >155,000 AD assays vs >40 MNs of multiple bacteria species. We built a linear discriminant analysis (LDA) and 17 ML models centered on the linear index and based on atoms to predict antibacterial compounds. The IFPTML-LDA model presented the following results for the training subset: specificity (Sp) = 76% out of 70,000 cases, sensitivity (Sn) = 70%, and Accuracy (Acc) = 73%. The same model also presented the following results for the validation subsets: Sp = 76%, Sn = 70%, and Acc = 73.1%. Among the IFPTML nonlinear models, the k nearest neighbors (KNN) showed the best results with Sn = 99.2%, Sp = 95.5%, Acc = 97.4%, and Area Under Receiver Operating Characteristic (AUROC) = 0.998 in training sets. In the validation series, the Random Forest had the best results: Sn = 93.96% and Sp = 87.02% (AUROC = 0.945). The IFPTML linear and nonlinear models regarding the ADs vs MNs have good statistical parameters, and they could contribute toward finding new metabolic mutations in antibiotic resistance and reducing time/costs in antibacterial drug research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在药物发现中,分配和分配系数,用于辛醇/水的logP和logD,被广泛用作分子亲脂性的指标,这反过来又对潜在药物的生物活性和生物利用度有很大的影响。有多种既定的方法,主要是片段或原子,计算logP,而logD预测通常依赖于计算的logP和pKa来估计给定pH下的中性和离子化种群。诸如ClogP之类的算法具有局限性,通常会导致化学相关分子的系统误差,而由于电子相互作用,pKa估计通常更加困难。可电离部分的诱导和共轭效应。我们提出了一种集成的机器学习QSAR建模方法,通过用实验数据训练模型来预测logD,同时使用商业软件预测的ClogP和pKa作为模型描述符。通过优化软件计算的ClogD的损失函数,我们建立了一个校正模型,该模型包含了来自软件的描述符和可用的实验logD数据.此外,我们使用软件预测的pKa's从logD模型计算logP。这里,我们使用公开或商业可用的logD数据对模型进行了训练,以表明这种方法可以改善商业软件对亲脂性的预测.当应用于其他logD数据集时,这种方法扩展了logD和logP预测在商业软件上的适用性。这些模型的性能与使用更大的专有logD数据集构建的模型相比是有利的。
    In drug discovery, partition and distribution coefficients, logP and logD for octanol/water, are widely used as metrics of the lipophilicity of molecules, which in turn have a strong influence on the bioactivity and bioavailability of potential drugs. There are a variety of established methods, mostly fragment or atom-based, to calculate logP while logD prediction generally relies on calculated logP and pKa for the estimation of neutral and ionized populations at a given pH. Algorithms such as ClogP have limitations generally leading to systematic errors for chemically related molecules while pKa estimation is generally more difficult due to the interplay of electronic, inductive and conjugation effects for ionizable moieties. We propose an integrated machine learning QSAR modeling approach to predict logD by training the model with experimental data while using ClogP and pKa predicted by commercial software as model descriptors. By optimizing the loss function for the ClogD calculated by the software, we build a correction model that incorporates both descriptors from the software and available experimental logD data. Additionally, we calculate logP from the logD model using the software predicted pKa\'s. Here, we have trained models using publicly or commercial available logD data to show that this approach can improve on commercial software predictions of lipophilicity. When applied to other logD data sets, this approach extends the domain of applicability of logD and logP predictions over commercial software. Performance of these models favorably compare with models built with a larger set of proprietary logD data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号