QSPR

QSPR
  • 文章类型: Journal Article
    本研究调查了瓜尔胶生物分子的定量结构-性质关系(QSPR)模型,专注于他们的结构参数。瓜尔胶,一种具有不同工业应用的多糖,表现出各种性质,如粘度,溶解度,和乳化能力,受其分子结构的影响。在这项研究中,M多项式和相关的拓扑指数被用作结构描述符以表示瓜尔胶的分子结构。M多项式和相关的拓扑指数捕获了重要的结构特征,包括尺寸,形状,分支,和连通性。通过将这些描述符与瓜尔胶特性的实验数据相关联,预测模型是使用回归分析技术开发的。分析揭示了沸点和分子量与所有考虑的拓扑描述符之间的强相关性。由此产生的模型提供了对瓜尔胶结构与其性质之间关系的见解,有利于优化瓜尔胶的生产和在各行业的应用。这项研究证明了M多项式和QSPR模型在阐明瓜尔胶等复杂生物分子的结构-性质关系中的实用性,促进生物材料科学和工业应用的发展。
    This study investigates the quantitative structure-property relationship (QSPR) modeling of guar gum biomolecules, focusing on their structural parameters. Guar gum, a polysaccharide with diverse industrial applications, exhibits various properties such as viscosity, solubility, and emulsifying ability, which are influenced by its molecular structure. In this research, M -polynomial and associated topological indices are employed as structural descriptors to represent the molecular structure of guar gum. The M -polynomial and associated topological indices capture important structural features, including size, shape, branching, and connectivity. By correlating these descriptors with experimental data on guar gum properties, predictive models are developed using regression analysis techniques. The analysis revealed a strong correlation between the boiling point and molecular weight and all the considered topological descriptors. The resulting models offer insights into the relationship between guar gum structure and its properties, facilitating the optimization of guar gum production and application in various industries. This study demonstrates the utility of M -polynomial and QSPR modeling in elucidating structure-property relationships of complex biomolecules like guar gum, contributing to the advancement of biomaterial science and industrial applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究描述了六个新模型的开发和评估,用于预测与化学危害高度相关的物理化学(PC)特性。暴露,和风险估计:溶解度(在水中SW和辛醇SO),蒸气压(VP),和辛醇-水(KOW),辛醇-空气(KOA),和空气-水(KAW)分配比。这些模型在迭代片段选择定量结构-活性关系(IFSQSAR)python包中实现,版本1.1.0.这些模型被实现为多参数线性自由能关系(PPLFER)方程,该方程结合了实验校准的系统参数和用QSPR预测的溶质描述符。另外两个辅助模型已经开发和实施,用于摩尔体积(MV)的QSPR和用于化学品在室温下的物理状态的分类器。描述了IFSQSAR方法,用于表征适用性域(AD)并计算以95%预测间隔(PI)表示的不确定性估计值,并在9,000个测量的分配比和4,000个VP和SW值上进行了测试。测量数据是IFSQSAR训练和验证数据集的外部数据,用于以无偏方式评估“新型化学品”模型的预测性。从验证数据集计算出的95%PI间隔需要按1.25的因子缩放以捕获95%的外部数据。对VP和SW的预测更加不确定,主要是由于区分其物理状态的挑战(即,液体或固体)在室温下。对数KOW模型的预测精度,小说的logKAW和logKOA,数据差的化学品估计在0.7到1.4的预测均方根误差(RMSEP)范围内,对数VP和对数SW的RMSEP在1.7-1.8范围内。科学贡献新的划分模型集成了经验PPLFER方程和QSAR,允许实验数据和模型预测的无缝集成。这项工作测试了模型对不在模型训练或外部验证数据集中的新型化学物质的真实预测性。
    This study describes the development and evaluation of six new models for predicting physical-chemical (PC) properties that are highly relevant for chemical hazard, exposure, and risk estimation: solubility (in water SW and octanol SO), vapor pressure (VP), and the octanol-water (KOW), octanol-air (KOA), and air-water (KAW) partition ratios. The models are implemented in the Iterative Fragment Selection Quantitative Structure-Activity Relationship (IFSQSAR) python package, Version 1.1.0. These models are implemented as Poly-Parameter Linear Free Energy Relationship (PPLFER) equations which combine experimentally calibrated system parameters and solute descriptors predicted with QSPRs. Two other ancillary models have been developed and implemented, a QSPR for Molar Volume (MV) and a classifier for the physical state of chemicals at room temperature. The IFSQSAR methods for characterizing applicability domain (AD) and calculating uncertainty estimates expressed as 95% prediction intervals (PI) for predicted properties are described and tested on 9,000 measured partition ratios and 4,000 VP and SW values. The measured data are external to IFSQSAR training and validation datasets and are used to assess the predictivity of the models for \"novel chemicals\" in an unbiased manner. The 95% PI intervals calculated from validation datasets for partition ratios needed to be scaled by a factor of 1.25 to capture 95% of the external data. Predictions for VP and SW are more uncertain, primarily due to the challenges in differentiating their physical state (i.e., liquids or solids) at room temperature. The prediction accuracy of the models for log KOW, log KAW and log KOA of novel, data-poor chemicals is estimated to be in the range of 0.7 to 1.4 root mean squared error of prediction (RMSEP), with RMSEP in the range 1.7-1.8 for log VP and log SW. Scientific contributionNew partitioning models integrate empirical PPLFER equations and QSARs, allowing for seamless integration of experimental data and model predictions. This work tests the real predictivity of the models for novel chemicals which are not in the model training or external validation datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    具有高塞贝克系数的热电材料,高导电性,和低热导率需要直接和有效地将未使用的热量转化为电能。在这项研究中,我们构建了预测塞贝克系数的模型,电导率,和使用现有材料数据库的热导率。除了晶体中原子的比例和使用材料的温度之外,来自X射线衍射(XRD)光谱的值用作表示材料的晶体结构的输入。证实了所构建的模型可以使用X射线衍射值高精度地预测特性。此外,使用构建的模型,我们成功地提出了具有高塞贝克系数的有前途的新候选材料,高电导率,和低热导率。
    Thermoelectric materials with a high Seebeck coefficient, high electrical conductivity, and low thermal conductivity are required to directly and efficiently convert unused heat into electricity. In this study, we construct models predicting the Seebeck coefficient, electrical conductivity, and thermal conductivity using existing material databases. In addition to the ratios of atoms in the crystals and temperature at which the materials are used, the values from the X-ray diffraction (XRD) spectra were used as inputs to represent the crystal structure of the materials. It was confirmed that the constructed models could predict the properties with high accuracy using the X-ray diffraction values. Additionally, using the constructed models, we succeeded in proposing promising new candidate materials with high Seebeck coefficients, high electric conductivities, and low thermal conductivities.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本研究旨在研究蛋白激酶抑制剂通过血脑屏障(BBB)的被动扩散,并建立其渗透性预测模型。材料与方法:我们使用平行人工膜通透性测定来获得34种化合物中每种化合物的logPe值,并计算这些结构的描述符以进行定量结构-性质关系建模,创建不同的回归模型。结果:计算了所有34种化合物的logPe值。支持向量机回归被认为是最可靠的,和CATS2D_09_DA,CATS2D_04_AA,B04[N-S]和F07[C-N]描述符被确定为对被动BBB渗透性最有影响。结论:已生成的定量结构-性质关系-支持向量机回归模型可作为新类似物BBB通透性初步筛选的有效方法。
    Aim: This study aims to investigate the passive diffusion of protein kinase inhibitors through the blood-brain barrier (BBB) and to develop a model for their permeability prediction. Materials & methods: We used the parallel artificial membrane permeability assay to obtain logPe values of each of 34 compounds and calculated descriptors for these structures to perform quantitative structure-property relationship modeling, creating different regression models. Results: The logPe values have been calculated for all 34 compounds. Support vector machine regression was considered the most reliable, and CATS2D_09_DA, CATS2D_04_AA, B04[N-S] and F07[C-N] descriptors were identified as the most influential to passive BBB permeability. Conclusion: The quantitative structure-property relationship-support vector machine regression model that has been generated can serve as an efficient method for preliminary screening of BBB permeability of new analogs.
    [Box: see text].
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在对天然表面活性剂的兴趣日益增长的景观中,为特定应用选择合适的一个仍然具有挑战性。广泛的,然而往往是不系统化的,微生物表面活性剂的知识,主要由鼠李糖脂(RL)代表,通常不会翻译超出科学出版物中提出的条件。这种限制源于表征微生物表面活性剂生产的众多变量及其相互依赖性。我们假设可以从现有文献和实验数据中开发出具有目标应用特性的生物合成RL的计算配方。我们积累了有关RL生物合成和胶束增溶的文献数据,并通过我们关于甘油三酯(TG)增溶的实验结果来增强它,当前文学中代表性不足的话题。利用这些数据,我们构建了可以预测RL特性和增溶效率的数学模型,表示为logPRL=f(碳源和氮源,生物合成参数)和logMSR=f(增溶物,鼠李糖脂(如logPRL),增溶参数),分别。模特们,其特征在于分别为0.581-0.997和0.804的稳健R2值,使得能够根据描述符的重要性和对预测值的正面或负面影响对描述符进行排名。这些模型已经转化为现成的计算器,旨在简化选择过程的工具,以确定最适合预期应用的生物表面活性剂。
    In the growing landscape of interest in natural surfactants, selecting the appropriate one for specific applications remains challenging. The extensive, yet often unsystematized, knowledge of microbial surfactants, predominantly represented by rhamnolipids (RLs), typically does not translate beyond the conditions presented in scientific publications. This limitation stems from the numerous variables and their interdependencies that characterize microbial surfactant production. We hypothesized that a computational recipe for biosynthesizing RLs with targeted applicational properties could be developed from existing literature and experimental data. We amassed literature data on RL biosynthesis and micellar solubilization and augmented it with our experimental results on the solubilization of triglycerides (TGs), a topic underrepresented in current literature. Utilizing this data, we constructed mathematical models that can predict RL characteristics and solubilization efficiency, represented as logPRL = f(carbon and nitrogen source, parameters of biosynthesis) and logMSR = f(solubilizate, rhamnolipid (e.g. logPRL), parameters of solubilization), respectively. The models, characterized by robust R2 values of respectively 0.581-0.997 and 0.804, enabled the ranking of descriptors based on their significance and impact-positive or negative-on the predicted values. These models have been translated into ready-to-use calculators, tools designed to streamline the selection process for identifying a biosurfactant optimally suited for intended applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在这项研究中,我们专注于建立定量结构-性质关系(QSPR)模型,以预测全氟烷基和多氟烷基物质(PFASs)的临界胶束浓度(CMC)。氟化和非氟化化合物的实验CMC值是从现有文献来源精心汇编的。我们的方法涉及基于应用于数据集的支持向量机(SVM)算法构建两种不同类型的模型。类型(I)模型专门针对氟化化合物的CMC值进行训练,而Type(II)模型是利用整个数据集开发的,掺入氟化和非氟化的化合物。对照参考数据进行了比较分析,以及两种模型类型之间。令人鼓舞的是,这两种类型的模型都表现出强大的预测能力,并表现出高可靠性。随后,选择具有最广泛适用范围的模型来补充现有的实验数据,从而增强我们对PFAS行为的理解。
    In this study, we focus on the development of Quantitative Structure-Property Relationship (QSPR) models to predict the critical micelle concentration (CMC) for per- and polyfluoroalkyl substances (PFASs). Experimental CMC values for both fluorinated and non-fluorinated compounds were meticulously compiled from existing literature sources. Our approach involved constructing two distinct types of models based on Support Vector Machine (SVM) algorithms applied to the dataset. Type (I) models were trained exclusively on CMC values for fluorinated compounds, while Type (II) models were developed utilizing the entire dataset, incorporating both fluorinated and non-fluorinated compounds. Comparative analyses were conducted against reference data, as well as between the two model types. Encouragingly, both types of models exhibited robust predictive capabilities and demonstrated high reliability. Subsequently, the model having the broadest applicability domain was selected to complement the existing experimental data, thereby enhancing our understanding of PFAS behaviour.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:兽用抗生素是用于杀死或抑制与动物疾病相关的病原菌生长的化学化合物。这些分子可以通过液相色谱-质谱(LC-MS)中的保留时间(tR)来定义。预测新兽用抗生素tR的一种策略是开发预测性定量结构-性质关系(QSPR)。在这项研究中使用。
    结果:选择了122种抗生素的数据库,其中使用HypersilGold柱测量了tR。通过集成无监督变量减少,建立了最优的三特征模型,替换方法变量子集选择和多元线性回归。训练集(R2=0.902和RMSEC=0.871)和测试集(Q2=0.854和RMSEP=1.064)的确定系数和均方根误差之间的可忽略差异表明了稳定和预测模型。在进一步的步骤中,提供了对每个描述符在预测tR中的作用机制的深入解释,以及构建准确预测新抗生素的理论化学空间。
    结论:这项工作中开发的计算机模型确定了与水溶性相关的三种分子描述符,辛醇-水分配系数以及负和亲脂原子对的存在。这里开发的QSPR可以由农业和食品化学家实施,以在LC-MS框架内识别和监测现有和新的抗生素。计算模型是根据经济合作与发展组织概述的五项原则开发的。本文受版权保护。保留所有权利。
    BACKGROUND: Veterinary antibiotics are chemical compounds used to kill or inhibit the growth of pathogenic bacteria associated with animal diseases. These molecules can be defined by their retention times (tR) in liquid chromatography-mass spectrometry (LC-MS). One strategy to predict the tR of new veterinary antibiotics is the development of predictive quantitative structure-property relationships (QSPRs), which were used in this study.
    RESULTS: A database of 122 antibiotics was selected in which the tR was measured using a Hypersil GOLD column. An optimal three-feature model was developed by integrating the unsupervised variable reduction, replacement method variable subset selection, and multiple linear regression. The negligible differences among the coefficient of determination and the root-mean-square error for the training set (R2 = 0.902 and RMSEC = 0.871) and test set (Q2 = 0.854 and RMSEP = 1.064) indicate a stable and predictive model. In a further step, a more in-depth explanation of the mechanism of action of each descriptor in predicting the tR is provided, with the construction of the theoretical chemical space for accurate predictions of new antibiotics.
    CONCLUSIONS: The in silico model developed in this work identified three molecular descriptors associated with aqueous solubility, octanol-water partition coefficient, and the presence of negative and lipophilic atom pairs. The QSPR developed here could be implemented by agricultural and food chemists to identify and monitor existing and new antibiotics within the framework of LC-MS. The computational model was developed in accordance with five principles outlined by the Organization for Economic Co-operation and Development. © 2024 Society of Chemical Industry.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    准确预测植物角质层-空气分配系数(Kca)对于评估有机污染物的生态风险和阐明其分配机制至关重要。当前的工作收集了来自25种植物物种和106种化合物(数据集(I))的255个测量的Kca值,并且将它们平均以建立包含106种化合物的Kca值的数据集(数据集(II))。机器学习算法(多元线性回归(MLR),多层感知器(MLP),k-最近邻(KNN),和梯度提升决策树(GBDT))用于开发八个QSPR模型来预测Kca。结果表明,所开发的模型具有很高的拟合优度,以及良好的鲁棒性和预测性能。推荐GBDT-2模型(Radj2=0.925,QLOO2=0.756,QBOOT2=0.864,Rext2=0.837,Qext2=0.811和CCC=0.891)作为预测Kca的最佳模型。此外,基于Shapley加法解释(SHAP)方法解释GBDT-1和GBDT-2模型,如分子大小,极化率,和分子复杂性,影响植物角质层吸附空气中有机污染物的能力。所开发模型的令人满意的性能表明,它们在指导有机污染物的环境归宿和促进生态友好和可持续化学工程的进展方面具有广泛的应用潜力。
    Accurately predicting plant cuticle-air partition coefficients (Kca) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured Kca values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing Kca values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting Kca. The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (Radj2 = 0.925, QLOO2 = 0.756, QBOOT2 = 0.864, Rext2 = 0.837, Qext2 = 0.811, and CCC = 0.891) is recommended as the best model for predicting Kca due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    临界胶束浓度(CMC)的确定是评价表面活性剂的一个重要因素,使其成为研究各种工业领域表面活性剂性能的重要工具。在本研究中,我们组装了一套完整的593种不同类别的表面活性剂,包括,阴离子,阳离子,非离子,两性离子,和Gemini表面活性剂利用定量结构-性质关系(QSPR)方法建立其分子结构与临界胶束浓度(pCMC)的负对数值之间的联系。统计分析显示,一组14个显著的Mordred描述符(SlogP,GATS6d,nAcid,GATS8dv,GATS4dv,PEOE_VSA11,GATS8d,ATS0p,GATS1d,MATS5p,GATS3d,NdssC,GATS6dv和EState_VSA4),随着温度,作为适当的投入。不同的机器学习方法,如多元线性回归(MLR),随机森林回归(RFR),人工神经网络(ANN),和支持向量回归机(SVM),本研究采用QSPR模型。根据QSPR模型的统计系数,使用Dragonfly超参数优化(SVR-DA)的SVR在预测pCMC值方面最准确,对于整个数据集,实现(R2=0.9740,Q2=0.9739,rm2=0.9627,和Δrm2=0.0244)。
    The determination of the critical micelle concentration (CMC) is a crucial factor when evaluating surfactants, making it an essential tool in studying the properties of surfactants in various industrial fields. In this present research, we assembled a comprehensive set of 593 different classes of surfactants including, anionic, cationic, nonionic, zwitterionic, and Gemini surfactants to establish a link between their molecular structure and the negative logarithmic value of critical micelle concentration (pCMC) utilizing quantitative structure-property relationship (QSPR) methodologies. Statistical analysis revealed that a set of 14 significant Mordred descriptors (SlogP, GATS6d, nAcid, GATS8dv, GATS4dv, PEOE_VSA11, GATS8d, ATS0p, GATS1d, MATS5p, GATS3d, NdssC, GATS6dv and EState_VSA4), along with temperature, served as appropriate inputs. Different machine learning methods, such as multiple linear regression (MLR), random forest regression (RFR), artificial neural network (ANN), and support vector regression (SVM), were employed in this study to build QSPR models. According to the statistical coefficients of QSPR models, SVR with Dragonfly hyperparameter optimization (SVR-DA) was the most accurate in predicting pCMC values, achieving (R2 = 0.9740, Q2 = 0.9739, r‾m2 = 0.9627, and Δrm2 = 0.0244) for the entire dataset.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    计算机辅助药物设计近年来发展迅速,和计算机设计的分子推进到临床的多个实例已经证明了该领域对医学的贡献。正确设计和实施的平台可以大大减少药物开发的时间和成本。虽然这些努力最初主要集中在靶标亲和力/活性上,现在人们认识到,其他参数在药物的成功开发及其临床进展中同样重要,包括药代动力学特性以及吸收,分布,新陈代谢,排泄和毒理学(ADMET)特性。在过去的十年里,已经开发了几个程序,将这些特性纳入药物设计和优化过程,并在不同程度上,允许多参数优化。这里,我们介绍了人工智能驱动的药物设计(AIDD)平台,它通过整合高通量的基于生理的药代动力学模拟(由GastroPlus提供支持)和ADMET预测(由ADMETPredictor提供支持)以及与当前生成模型完全不同的先进进化算法来自动化药物设计过程。AIDD在迭代地执行多目标优化时使用这些和其他估计来产生具有活性和类似铅的新型分子。在这里,我们描述了AIDD工作流程以及其中涉及的方法的详细信息。我们使用恶性疟原虫二氢乳清酸脱氢酶的三唑并嘧啶抑制剂数据集来说明AIDD如何产生新的分子组。
    Computer-aided drug design has advanced rapidly in recent years, and multiple instances of in silico designed molecules advancing to the clinic have demonstrated the contribution of this field to medicine. Properly designed and implemented platforms can drastically reduce drug development timelines and costs. While such efforts were initially focused primarily on target affinity/activity, it is now appreciated that other parameters are equally important in the successful development of a drug and its progression to the clinic, including pharmacokinetic properties as well as absorption, distribution, metabolic, excretion and toxicological (ADMET) properties. In the last decade, several programs have been developed that incorporate these properties into the drug design and optimization process and to varying degrees, allowing for multi-parameter optimization. Here, we introduce the Artificial Intelligence-driven Drug Design (AIDD) platform, which automates the drug design process by integrating high-throughput physiologically-based pharmacokinetic simulations (powered by GastroPlus) and ADMET predictions (powered by ADMET Predictor) with an advanced evolutionary algorithm that is quite different than current generative models. AIDD uses these and other estimates in iteratively performing multi-objective optimizations to produce novel molecules that are active and lead-like. Here we describe the AIDD workflow and details of the methodologies involved therein. We use a dataset of triazolopyrimidine inhibitors of the dihydroorotate dehydrogenase from Plasmodium falciparum to illustrate how AIDD generates novel sets of molecules.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号