chemical space

化学空间
  • 文章类型: Journal Article
    蛋白质合成方法已经适应于掺入不断增长水平的非天然组分。同时,从头设计蛋白质结构和功能已经迅速成为一种可行的能力。然而,这两个令人兴奋的趋势尚未以有意义的方式相交。与非蛋白成分进行从头设计的能力要求合成和计算在共同的目标和应用上对齐。这种观点考察了这些领域的最新技术,并确定了具体的,相应的应用,以推进该领域向广义大分子设计。
    Protein synthesis methods have been adapted to incorporate an ever-growing level of non-natural components. Meanwhile, design of de novo protein structure and function has rapidly emerged as a viable capability. Yet, these two exciting trends have yet to intersect in a meaningful way. The ability to perform de novo design with non-proteinogenic components requires that synthesis and computation align on common targets and applications. This perspective examines the state of the art in these areas and identifies specific, consequential applications to advance the field toward generalized macromolecule design.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在过去的二十年里,虚拟筛选(VS)一直是一种有效的药物发现方法。今天,数十亿种商业上可获得的化合物被常规筛选,已经报道了许多VS的成功例子。VS方法继续发展,包括机器学习和基于物理的方法。
    作者研究了VS在药物发现中的最新例子,并讨论了来自计算命中发现实验(CACHE)挑战的关键评估的前瞻性命中发现结果。作者还强调了进行VS的成本考虑和开源选择,并研究了VS的化学空间覆盖和文库选择。
    先进的VS方法,包括使用机器学习技术和增加的计算机资源,以及容易进入合成可用的化学空间,商业和开源VS平台允许查询数十亿分子的超大型库(ULL)。令人印象深刻的潜在ULLVS活动在许多目标类别中产生了强大的结构新颖的命中。尽管如此,许多成功的当代VS方法仍然使用相当小的聚焦库。这种明显的二分法说明,VS最好以适合目的的方式选择合适的化学空间进行。需要开发更好的方法来解决更具挑战性的目标。
    UNASSIGNED: For the past two decades, virtual screening (VS) has been an efficient hit finding approach for drug discovery. Today, billions of commercially accessible compounds are routinely screened, and many successful examples of VS have been reported. VS methods continue to evolve, including machine learning and physics-based methods.
    UNASSIGNED: The authors examine recent examples of VS in drug discovery and discuss prospective hit finding results from the critical assessment of computational hit-finding experiments (CACHE) challenge. The authors also highlight the cost considerations and open-source options for conducting VS and examine chemical space coverage and library selections for VS.
    UNASSIGNED: The advancement of sophisticated VS approaches, including the use of machine learning techniques and increased computer resources as well as the ease of access to synthetically available chemical spaces, and commercial and open-source VS platforms allow for interrogating ultra-large libraries (ULL) of billions of molecules. An impressive number of prospective ULL VS campaigns have generated potent and structurally novel hits across many target classes. Nonetheless, many successful contemporary VS approaches still use considerably smaller focused libraries. This apparent dichotomy illustrates that VS is best conducted in a fit-for-purpose way choosing an appropriate chemical space. Better methods need to be developed to tackle more challenging targets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    数据稀缺是阻碍化学效应预测模型发展的最关键问题之一。利用来自相关任务的知识的多任务学习算法显示出处理有限数据任务的潜力。然而,当前的多任务方法主要集中在从任务标签可用于大多数训练样本的数据集进行学习。由于数据集是为不同的目的生成的,具有不同的化学空间,传统的多任务学习方法可能不适合。这项研究提出了一种新颖的多任务学习方法MTForestNet,可以处理数据稀缺问题,并从具有不同化学空间的任务中学习。MTForestNet由以渐进网络形式组织的随机森林分类器的节点组成,其中每个节点表示从特定任务中学习的随机森林模型。为了证明MTForestNet的有效性,收集并利用48个斑马鱼毒性数据集作为实例。其中,两项任务与其他任务有很大不同,只有1.3%的普通化学品与其他任务共享。在独立测试中,与单任务和多任务方法相比,MTForestNet的接收器工作特征曲线(AUC)值为0.911,具有较高的面积。从开发的斑马鱼毒性模型得出的总体毒性与实验确定的总体毒性密切相关。此外,开发的斑马鱼毒性模型的输出可以用作增强发育毒性预测的特征。开发的模型可有效预测斑马鱼的毒性,拟议的MTForestNet有望用于具有不同化学空间的任务,可用于其他任务。科学贡献提出了一种新颖的多任务学习算法MTForestNet,以解决使用具有不同化学空间的数据集开发模型的挑战,这是化学信息学任务的常见问题。作为一个例子,斑马鱼毒性预测模型是使用拟议的MTForestNet开发的,该模型提供了优于常规单任务和多任务学习方法的性能。此外,建立的斑马鱼毒性预测模型可以减少动物试验。
    Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks.Scieific contributionA novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    幽门螺杆菌是胃癌的主要致病因子,尤其是非心脏胃癌.这种细菌依靠产生大量氨的脲酶来定殖宿主。在这里,该研究为通过探索已知抑制剂设计的高活性分子驱动脲酶抑制的结构模式提供了有价值的见解。首先,设计了一个集成模型来预测新型化合物在自动工作流程(R2=0.761)中的抑制活性,该工作流程结合了四种机器学习方法。数据集以化学空间为特征,包括分子支架,聚类分析,物理化学性质分布,和活动悬崖。通过这些分析,突出了负责不同活性的异羟肟酸基团和苯环。活性悬崖对未发现的异羟肟酸衍生物上苯环的取代基是显著增强活性的关键结构。此外,设计了11个异羟胺酸衍生物,名为mol1-11。分子动力学模拟结果表明,mol9表现出稳定的活性位点瓣的闭合构象,并有望成为有希望的候选药物幽门螺杆菌感染和进一步的体外,在体内,和临床试验证明在未来。
    Helicobacter pylori is the main causative agent of gastric cancer, especially non-cardiac gastric cancers. This bacterium relies on urease producing much ammonia to colonize the host. Herein, the study provides valuable insights into structural patterns driving urease inhibition for high-activity molecules designed via exploring known inhibitors. Firstly, an ensemble model was devised to predict the inhibitory activity of novel compounds in an automated workflow (R2 = 0.761) that combines four machine learning approaches. The dataset was characterized in terms of chemical space, including molecular scaffolds, clustering analysis, distribution for physicochemical properties, and activity cliffs. Through these analyses, the hydroxamic acid group and the benzene ring responsible for distinct activity were highlighted. Activity cliff pairs uncovered substituents of the benzene ring on hydroxamic acid derivatives are key structures for substantial activity enhancement. Moreover, 11 hydroxamic acid derivatives were designed, named mol1-11. Results of molecular dynamic simulations showed that the mol9 exhibited stabilization of the active site flap\'s closed conformation and are expected to be promising drug candidates for Helicobacter pylori infection and further in vitro, in vivo, and clinical trials to demonstrate in future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    天然产物的化合物数据库在药物发现和开发项目中起着至关重要的作用,并在其他领域具有影响。比如食品化学研究,生态学和代谢组学。最近,我们汇集了拉丁美洲天然产品数据库(LANaPDB)的第一个版本,这是来自六个国家的研究人员的集体努力,目的是在具有大量生物多样性的地理区域整合一个公共和代表性的天然产品图书馆。本工作旨在对LANaPDB的更新版本和构成LANaPDB一部分的单独的十个化合物数据库的天然产品相似度进行比较和广泛的分析。拉丁美洲化合物数据库的天然产物相似度概况与公共领域的其他主要天然产物数据库和一组批准用于临床的小分子药物的概况形成对比。作为广泛表征的一部分,我们采用了几种天然产物相似性的化学信息学指标。这项研究的结果将引起从事天然产物数据库的全球社区的关注,不仅在拉丁美洲,而且在世界各地。
    Compound databases of natural products play a crucial role in drug discovery and development projects and have implications in other areas, such as food chemical research, ecology and metabolomics. Recently, we put together the first version of the Latin American Natural Product database (LANaPDB) as a collective effort of researchers from six countries to ensemble a public and representative library of natural products in a geographical region with a large biodiversity. The present work aims to conduct a comparative and extensive profiling of the natural product-likeness of an updated version of LANaPDB and the individual ten compound databases that form part of LANaPDB. The natural product-likeness profile of the Latin American compound databases is contrasted with the profile of other major natural product databases in the public domain and a set of small-molecule drugs approved for clinical use. As part of the extensive characterization, we employed several chemoinformatics metrics of natural product likeness. The results of this study will capture the attention of the global community engaged in natural product databases, not only in Latin America but across the world.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在现代“经济学”时代,人类暴露组的测量是遗传驱动因素和疾病结果之间的关键缺失环节。高分辨率质谱(HRMS),常规用于蛋白质组学和代谢组学,已成为广泛分布化学暴露剂和相关生物分子以进行准确质量测量的领先技术,高灵敏度,快速数据采集,增加化学空间的分辨率。非目标方法越来越容易获得,支持从传统假设驱动的转变,以定量为中心的有针对性的分析,以数据驱动,产生假设的化学暴露广泛的分析。然而,基于HRMS的曝光组学遇到了独特的挑战。需要新的分析和计算基础设施,以通过简化、可扩展,协调的工作流程和数据管道,允许纵向化学品暴露组跟踪,回顾性验证,和多组学整合,以实现有意义的健康导向推断。在这篇文章中,我们调查了关于最先进的基于HRMS的技术的文献,回顾当前的分析工作流程和信息管道,并为化学家提供有关暴露组学方法的最新参考,毒理学家,流行病学家,护理提供者,以及健康科学和医学的利益相关者。我们建议努力对适合用途的平台进行基准测试,以扩大化学空间的覆盖范围,包括气/液色谱-HRMS(GC-HRMS和LC-HRMS),讨论机会,挑战,以及推进新兴领域的战略。
    In the modern \"omics\" era, measurement of the human exposome is a critical missing link between genetic drivers and disease outcomes. High-resolution mass spectrometry (HRMS), routinely used in proteomics and metabolomics, has emerged as a leading technology to broadly profile chemical exposure agents and related biomolecules for accurate mass measurement, high sensitivity, rapid data acquisition, and increased resolution of chemical space. Non-targeted approaches are increasingly accessible, supporting a shift from conventional hypothesis-driven, quantitation-centric targeted analyses toward data-driven, hypothesis-generating chemical exposome-wide profiling. However, HRMS-based exposomics encounters unique challenges. New analytical and computational infrastructures are needed to expand the analysis coverage through streamlined, scalable, and harmonized workflows and data pipelines that permit longitudinal chemical exposome tracking, retrospective validation, and multi-omics integration for meaningful health-oriented inferences. In this article, we survey the literature on state-of-the-art HRMS-based technologies, review current analytical workflows and informatic pipelines, and provide an up-to-date reference on exposomic approaches for chemists, toxicologists, epidemiologists, care providers, and stakeholders in health sciences and medicine. We propose efforts to benchmark fit-for-purpose platforms for expanding coverage of chemical space, including gas/liquid chromatography-HRMS (GC-HRMS and LC-HRMS), and discuss opportunities, challenges, and strategies to advance the burgeoning field of the exposome.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    化学空间的探索是化学信息学的一个基本方面,特别是当人们探索一个大的化合物数据集,以将化学结构与分子性质联系起来。在这项研究中,我们在药效水平上扩展了我们以前在化学空间可视化方面的工作.而不是使用传统的亲和力二元分类(活性与非活性),我们引入了一种改进的方法,根据化合物的活性水平将其分为四个不同的类别:超活性,非常活跃,活跃,不活跃。这种分类丰富了应用于药效团空间的配色方案,其中药效团假说的颜色表示由相关化合物驱动。以BCR-ABL酪氨酸激酶为例,我们确定了与药效团活性不连续相对应的有趣区域,为结构-活动关系分析提供有价值的见解。
    The exploration of chemical space is a fundamental aspect of chemoinformatics, particularly when one explores a large compound data set to relate chemical structures with molecular properties. In this study, we extend our previous work on chemical space visualization at the pharmacophoric level. Instead of using conventional binary classification of affinity (active vs inactive), we introduce a refined approach that categorizes compounds into four distinct classes based on their activity levels: super active, very active, active, and inactive. This classification enriches the color scheme applied to pharmacophore space, where the color representation of a pharmacophore hypothesis is driven by the associated compounds. Using the BCR-ABL tyrosine kinase as a case study, we identified intriguing regions corresponding to pharmacophore activity discontinuities, providing valuable insights for structure-activity relationships analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    化学空间的计算探索在现代化学信息学研究中对于加速发现新的生物活性化合物至关重要。在这项研究中,我们对分子发生器产生的潜在糖皮质激素受体(GR)配体的化学库进行了详细分析,莫弗.为了生成目标GR库并构建分类模型,来自ChEMBL数据库以及内部IMG库的结构,在初级荧光素酶报告细胞试验中对其生物活性进行了实验筛选,被利用。将靶向GR配体文库的组成与随机采样化学空间的参考文库进行比较。随机森林模型用于确定配体的生物活性,使用共形预测结合其适用域。已证明,与随机文库相比,GR文库显著富含GR配体。此外,前瞻性分析表明,Molpher成功设计了化合物,随后通过实验证实对GR具有活性。还鉴定了34个潜在的新GR配体的集合。此外,这项研究的一个重要贡献是建立了一个全面的工作流程来评估计算生成的配体,特别是那些对目标有潜在活动的人,这些目标很难停靠。
    Computational exploration of chemical space is crucial in modern cheminformatics research for accelerating the discovery of new biologically active compounds. In this study, we present a detailed analysis of the chemical library of potential glucocorticoid receptor (GR) ligands generated by the molecular generator, Molpher. To generate the targeted GR library and construct the classification models, structures from the ChEMBL database as well as from the internal IMG library, which was experimentally screened for biological activity in the primary luciferase reporter cell assay, were utilized. The composition of the targeted GR ligand library was compared with a reference library that randomly samples chemical space. A random forest model was used to determine the biological activity of ligands, incorporating its applicability domain using conformal prediction. It was demonstrated that the GR library is significantly enriched with GR ligands compared to the random library. Furthermore, a prospective analysis demonstrated that Molpher successfully designed compounds, which were subsequently experimentally confirmed to be active on the GR. A collection of 34 potential new GR ligands was also identified. Moreover, an important contribution of this study is the establishment of a comprehensive workflow for evaluating computationally generated ligands, particularly those with potential activity against targets that are challenging to dock.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    众所周知,药物发现的过程是漫长且资源密集的。人工智能方法为加速识别具有药物开发必要特性的分子带来了希望。药物相似性评估对于候选药物的虚拟筛选至关重要。然而,传统的方法,如定量估计药物相似度(QED)难以准确区分药物和非药物分子。此外,一些基于深度学习的二元分类模型严重依赖于选择训练负集。为了应对这些挑战,我们引入了一个新的无监督学习框架,叫做DrugMetric,基于化学空间距离定量评估药物相似性的创新框架。DrugMetric将变分自动编码器的强大学习能力与高斯混合模型的判别能力融合在一起。这种协同作用使DrugMetric能够有效地识别不同数据集之间药物相似度的显著差异。此外,DrugMetric结合了集成学习的原理,以增强其预测能力。在测试各种任务和数据集后,DrugMetric始终展示卓越的评分和分类性能。它擅长量化药物相似度和准确区分候选药物和非药物,超越包括QED在内的传统方法。这项工作强调了DrugMetric作为药物相似度评分的实用工具,促进虚拟药物筛选的加速,并在其他生化领域具有潜在的应用前景。
    The process of drug discovery is widely known to be lengthy and resource-intensive. Artificial Intelligence approaches bring hope for accelerating the identification of molecules with the necessary properties for drug development. Drug-likeness assessment is crucial for the virtual screening of candidate drugs. However, traditional methods like Quantitative Estimation of Drug-likeness (QED) struggle to distinguish between drug and non-drug molecules accurately. Additionally, some deep learning-based binary classification models heavily rely on selecting training negative sets. To address these challenges, we introduce a novel unsupervised learning framework called DrugMetric, an innovative framework for quantitatively assessing drug-likeness based on the chemical space distance. DrugMetric blends the powerful learning ability of variational autoencoders with the discriminative ability of the Gaussian Mixture Model. This synergy enables DrugMetric to identify significant differences in drug-likeness across different datasets effectively. Moreover, DrugMetric incorporates principles of ensemble learning to enhance its predictive capabilities. Upon testing over a variety of tasks and datasets, DrugMetric consistently showcases superior scoring and classification performance. It excels in quantifying drug-likeness and accurately distinguishing candidate drugs from non-drugs, surpassing traditional methods including QED. This work highlights DrugMetric as a practical tool for drug-likeness scoring, facilitating the acceleration of virtual drug screening, and has potential applications in other biochemical fields.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项工作考察了药物发现和开发的当前景观,特别关注化学和药理学空间。它强调了了解这些空间以预测药物发现的未来趋势的重要性。化学信息学和数据分析的使用使这些空间的计算机探索成为可能,允许药物的视角,2020年后批准的药物,以及临床候选药物,它们是从新发布的ChEMBL34(2024年3月)中提取的。对化学和药理学空间的这种观点能够识别趋势和要占据的领域,从而为未来更有效和有针对性的药物发现和开发策略创造机会。
    This work examines the current landscape of drug discovery and development, with a particular focus on the chemical and pharmacological spaces. It emphasizes the importance of understanding these spaces to anticipate future trends in drug discovery. The use of cheminformatics and data analysis enabled in silico exploration of these spaces, allowing a perspective of drugs, approved drugs after 2020, and clinical candidates, which were extracted from the newly released ChEMBL34 (March 2024). This perspective on chemical and pharmacological spaces enables the identification of trends and areas to be occupied, thereby creating opportunities for more effective and targeted drug discovery and development strategies in the future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号