QSPR

QSPR
  • 文章类型: Journal Article
    基于递归神经网络的变分异质编码器,用分子结构的SMILES线性符号训练,用于导出以下原子描述符:从整个分子的原始SMILES和目标原子替换的同一分子的SMILES获得的δ潜在空间向量(DLSVs)。探索了不同的替代品,即,改变原子元素,替换为训练集中未使用的模型词汇表的字符,或从SMILES中去除目标原子。具有t分布随机邻居嵌入(t-SNE)的DLSV描述符的无监督映射揭示了根据原子元素的显着聚类,杂交,原子类型,和芳香性。原子DLSV描述符用于训练机器学习(ML)模型以预测19FNMR化学位移。对于具有随机森林或梯度增强回归量的1046个分子的独立测试集,获得了高达0.89的R2和高达5.5ppm的平均绝对误差。来自Transformer模型的中间表示产生了可比的结果。此外,DLSV被用作潜在空间中的分子算子:卤化(H→F取代)的DLSV被求和为4135个没有氟原子的新分子的LSV,并解码为SMILES,产生99%的有效微笑,其中75%的SMILES掺入氟和56%的结构掺入氟而没有其他结构变化。
    A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the same molecule with the target atom replaced. Different replacements were explored, namely, changing the atomic element, replacement with a character of the model vocabulary not used in the training set, or the removal of the target atom from the SMILES. Unsupervised mapping of the DLSV descriptors with t-distributed stochastic neighbor embedding (t-SNE) revealed a remarkable clustering according to the atomic element, hybridization, atomic type, and aromaticity. Atomic DLSV descriptors were used to train machine learning (ML) models to predict 19F NMR chemical shifts. An R2 of up to 0.89 and mean absolute errors of up to 5.5 ppm were obtained for an independent test set of 1046 molecules with random forests or a gradient-boosting regressor. Intermediate representations from a Transformer model yielded comparable results. Furthermore, DLSVs were applied as molecular operators in the latent space: the DLSV of a halogenation (H→F substitution) was summed to the LSVs of 4135 new molecules with no fluorine atom and decoded into SMILES, yielding 99% of valid SMILES, with 75% of the SMILES incorporating fluorine and 56% of the structures incorporating fluorine with no other structural change.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    QSPR在数学上将物理化学性质与分子结构联系起来。可以使用拓扑指数来预测化学分子的物理化学性质。这是消除昂贵和耗时的实验室测试的有效方法。我们在基于mev度和mve度的指数与苯环烃的物理性质之间建立了QSPR。为了计算这些指数,我们使用Maple软件设计了一个程序,并使用SPSS软件开发了指标与物理性质之间的相关性。我们的研究表明,基于mve度的和连通性(χmve)和原子键连通性(ABCmve)指数,基于mev度的Randić(Rmev)和萨格勒布(Mmev)指数是三个最重要的参数,对理化性质具有良好的预测能力。我们检查了Rmev预测摩尔折射率和沸点,χmve预测LogP和焓,ABCmve预测分子量,MMev预测了Gibb的能量,派电子能量和亨利定律。此外,我们计算了线性[n]-苯基的指数。
    QSPR mathematically links physicochemical properties with the structure of a molecule. The physicochemical properties of chemical molecules can be predicted using topological indices. It is an effective method for eliminating costly and time-consuming laboratory tests. We established a QSPR between mev-degree and mve-degree-based indices and the physical properties of benzenoid hydrocarbons. To compute these indices, we designed a program using Maple software and the correlation between indices and physical properties was developed using the SPSS software. Our study reveals that the mve-degree-based sum-connectivity ( χ mve ) and atom bond connectivity ( A B C mve ) index, mev-degree-based Randić ( R mev ) and Zagreb ( M mev ) index are the three most significant parameters and have good prediction ability for the physicochemical properties. We examined that R mev predicts the molar refractivity and boiling point, χ mve predicts the LogP and enthalpy, A B C mve predicts the molecular weight, M mev predicts the Gibb\'s energy, Pie-electron energy and Henry\'s law. Moreover, we computed the indices for the linear [n]-phenylen.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究调查了瓜尔胶生物分子的定量结构-性质关系(QSPR)模型,专注于他们的结构参数。瓜尔胶,一种具有不同工业应用的多糖,表现出各种性质,如粘度,溶解度,和乳化能力,受其分子结构的影响。在这项研究中,M多项式和相关的拓扑指数被用作结构描述符以表示瓜尔胶的分子结构。M多项式和相关的拓扑指数捕获了重要的结构特征,包括尺寸,形状,分支,和连通性。通过将这些描述符与瓜尔胶特性的实验数据相关联,预测模型是使用回归分析技术开发的。分析揭示了沸点和分子量与所有考虑的拓扑描述符之间的强相关性。由此产生的模型提供了对瓜尔胶结构与其性质之间关系的见解,有利于优化瓜尔胶的生产和在各行业的应用。这项研究证明了M多项式和QSPR模型在阐明瓜尔胶等复杂生物分子的结构-性质关系中的实用性,促进生物材料科学和工业应用的发展。
    This study investigates the quantitative structure-property relationship (QSPR) modeling of guar gum biomolecules, focusing on their structural parameters. Guar gum, a polysaccharide with diverse industrial applications, exhibits various properties such as viscosity, solubility, and emulsifying ability, which are influenced by its molecular structure. In this research, M -polynomial and associated topological indices are employed as structural descriptors to represent the molecular structure of guar gum. The M -polynomial and associated topological indices capture important structural features, including size, shape, branching, and connectivity. By correlating these descriptors with experimental data on guar gum properties, predictive models are developed using regression analysis techniques. The analysis revealed a strong correlation between the boiling point and molecular weight and all the considered topological descriptors. The resulting models offer insights into the relationship between guar gum structure and its properties, facilitating the optimization of guar gum production and application in various industries. This study demonstrates the utility of M -polynomial and QSPR modeling in elucidating structure-property relationships of complex biomolecules like guar gum, contributing to the advancement of biomaterial science and industrial applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究描述了六个新模型的开发和评估,用于预测与化学危害高度相关的物理化学(PC)特性。暴露,和风险估计:溶解度(在水中SW和辛醇SO),蒸气压(VP),和辛醇-水(KOW),辛醇-空气(KOA),和空气-水(KAW)分配比。这些模型在迭代片段选择定量结构-活性关系(IFSQSAR)python包中实现,版本1.1.0.这些模型被实现为多参数线性自由能关系(PPLFER)方程,该方程结合了实验校准的系统参数和用QSPR预测的溶质描述符。另外两个辅助模型已经开发和实施,用于摩尔体积(MV)的QSPR和用于化学品在室温下的物理状态的分类器。描述了IFSQSAR方法,用于表征适用性域(AD)并计算以95%预测间隔(PI)表示的不确定性估计值,并在9,000个测量的分配比和4,000个VP和SW值上进行了测试。测量数据是IFSQSAR训练和验证数据集的外部数据,用于以无偏方式评估“新型化学品”模型的预测性。从验证数据集计算出的95%PI间隔需要按1.25的因子缩放以捕获95%的外部数据。对VP和SW的预测更加不确定,主要是由于区分其物理状态的挑战(即,液体或固体)在室温下。对数KOW模型的预测精度,小说的logKAW和logKOA,数据差的化学品估计在0.7到1.4的预测均方根误差(RMSEP)范围内,对数VP和对数SW的RMSEP在1.7-1.8范围内。科学贡献新的划分模型集成了经验PPLFER方程和QSAR,允许实验数据和模型预测的无缝集成。这项工作测试了模型对不在模型训练或外部验证数据集中的新型化学物质的真实预测性。
    This study describes the development and evaluation of six new models for predicting physical-chemical (PC) properties that are highly relevant for chemical hazard, exposure, and risk estimation: solubility (in water SW and octanol SO), vapor pressure (VP), and the octanol-water (KOW), octanol-air (KOA), and air-water (KAW) partition ratios. The models are implemented in the Iterative Fragment Selection Quantitative Structure-Activity Relationship (IFSQSAR) python package, Version 1.1.0. These models are implemented as Poly-Parameter Linear Free Energy Relationship (PPLFER) equations which combine experimentally calibrated system parameters and solute descriptors predicted with QSPRs. Two other ancillary models have been developed and implemented, a QSPR for Molar Volume (MV) and a classifier for the physical state of chemicals at room temperature. The IFSQSAR methods for characterizing applicability domain (AD) and calculating uncertainty estimates expressed as 95% prediction intervals (PI) for predicted properties are described and tested on 9,000 measured partition ratios and 4,000 VP and SW values. The measured data are external to IFSQSAR training and validation datasets and are used to assess the predictivity of the models for \"novel chemicals\" in an unbiased manner. The 95% PI intervals calculated from validation datasets for partition ratios needed to be scaled by a factor of 1.25 to capture 95% of the external data. Predictions for VP and SW are more uncertain, primarily due to the challenges in differentiating their physical state (i.e., liquids or solids) at room temperature. The prediction accuracy of the models for log KOW, log KAW and log KOA of novel, data-poor chemicals is estimated to be in the range of 0.7 to 1.4 root mean squared error of prediction (RMSEP), with RMSEP in the range 1.7-1.8 for log VP and log SW. Scientific contributionNew partitioning models integrate empirical PPLFER equations and QSARs, allowing for seamless integration of experimental data and model predictions. This work tests the real predictivity of the models for novel chemicals which are not in the model training or external validation datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    具有高塞贝克系数的热电材料,高导电性,和低热导率需要直接和有效地将未使用的热量转化为电能。在这项研究中,我们构建了预测塞贝克系数的模型,电导率,和使用现有材料数据库的热导率。除了晶体中原子的比例和使用材料的温度之外,来自X射线衍射(XRD)光谱的值用作表示材料的晶体结构的输入。证实了所构建的模型可以使用X射线衍射值高精度地预测特性。此外,使用构建的模型,我们成功地提出了具有高塞贝克系数的有前途的新候选材料,高电导率,和低热导率。
    Thermoelectric materials with a high Seebeck coefficient, high electrical conductivity, and low thermal conductivity are required to directly and efficiently convert unused heat into electricity. In this study, we construct models predicting the Seebeck coefficient, electrical conductivity, and thermal conductivity using existing material databases. In addition to the ratios of atoms in the crystals and temperature at which the materials are used, the values from the X-ray diffraction (XRD) spectra were used as inputs to represent the crystal structure of the materials. It was confirmed that the constructed models could predict the properties with high accuracy using the X-ray diffraction values. Additionally, using the constructed models, we succeeded in proposing promising new candidate materials with high Seebeck coefficients, high electric conductivities, and low thermal conductivities.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本研究旨在研究蛋白激酶抑制剂通过血脑屏障(BBB)的被动扩散,并建立其渗透性预测模型。材料与方法:我们使用平行人工膜通透性测定来获得34种化合物中每种化合物的logPe值,并计算这些结构的描述符以进行定量结构-性质关系建模,创建不同的回归模型。结果:计算了所有34种化合物的logPe值。支持向量机回归被认为是最可靠的,和CATS2D_09_DA,CATS2D_04_AA,B04[N-S]和F07[C-N]描述符被确定为对被动BBB渗透性最有影响。结论:已生成的定量结构-性质关系-支持向量机回归模型可作为新类似物BBB通透性初步筛选的有效方法。
    Aim: This study aims to investigate the passive diffusion of protein kinase inhibitors through the blood-brain barrier (BBB) and to develop a model for their permeability prediction. Materials & methods: We used the parallel artificial membrane permeability assay to obtain logPe values of each of 34 compounds and calculated descriptors for these structures to perform quantitative structure-property relationship modeling, creating different regression models. Results: The logPe values have been calculated for all 34 compounds. Support vector machine regression was considered the most reliable, and CATS2D_09_DA, CATS2D_04_AA, B04[N-S] and F07[C-N] descriptors were identified as the most influential to passive BBB permeability. Conclusion: The quantitative structure-property relationship-support vector machine regression model that has been generated can serve as an efficient method for preliminary screening of BBB permeability of new analogs.
    [Box: see text].
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在对天然表面活性剂的兴趣日益增长的景观中,为特定应用选择合适的一个仍然具有挑战性。广泛的,然而往往是不系统化的,微生物表面活性剂的知识,主要由鼠李糖脂(RL)代表,通常不会翻译超出科学出版物中提出的条件。这种限制源于表征微生物表面活性剂生产的众多变量及其相互依赖性。我们假设可以从现有文献和实验数据中开发出具有目标应用特性的生物合成RL的计算配方。我们积累了有关RL生物合成和胶束增溶的文献数据,并通过我们关于甘油三酯(TG)增溶的实验结果来增强它,当前文学中代表性不足的话题。利用这些数据,我们构建了可以预测RL特性和增溶效率的数学模型,表示为logPRL=f(碳源和氮源,生物合成参数)和logMSR=f(增溶物,鼠李糖脂(如logPRL),增溶参数),分别。模特们,其特征在于分别为0.581-0.997和0.804的稳健R2值,使得能够根据描述符的重要性和对预测值的正面或负面影响对描述符进行排名。这些模型已经转化为现成的计算器,旨在简化选择过程的工具,以确定最适合预期应用的生物表面活性剂。
    In the growing landscape of interest in natural surfactants, selecting the appropriate one for specific applications remains challenging. The extensive, yet often unsystematized, knowledge of microbial surfactants, predominantly represented by rhamnolipids (RLs), typically does not translate beyond the conditions presented in scientific publications. This limitation stems from the numerous variables and their interdependencies that characterize microbial surfactant production. We hypothesized that a computational recipe for biosynthesizing RLs with targeted applicational properties could be developed from existing literature and experimental data. We amassed literature data on RL biosynthesis and micellar solubilization and augmented it with our experimental results on the solubilization of triglycerides (TGs), a topic underrepresented in current literature. Utilizing this data, we constructed mathematical models that can predict RL characteristics and solubilization efficiency, represented as logPRL = f(carbon and nitrogen source, parameters of biosynthesis) and logMSR = f(solubilizate, rhamnolipid (e.g. logPRL), parameters of solubilization), respectively. The models, characterized by robust R2 values of respectively 0.581-0.997 and 0.804, enabled the ranking of descriptors based on their significance and impact-positive or negative-on the predicted values. These models have been translated into ready-to-use calculators, tools designed to streamline the selection process for identifying a biosurfactant optimally suited for intended applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在这项研究中,我们专注于建立定量结构-性质关系(QSPR)模型,以预测全氟烷基和多氟烷基物质(PFASs)的临界胶束浓度(CMC)。氟化和非氟化化合物的实验CMC值是从现有文献来源精心汇编的。我们的方法涉及基于应用于数据集的支持向量机(SVM)算法构建两种不同类型的模型。类型(I)模型专门针对氟化化合物的CMC值进行训练,而Type(II)模型是利用整个数据集开发的,掺入氟化和非氟化的化合物。对照参考数据进行了比较分析,以及两种模型类型之间。令人鼓舞的是,这两种类型的模型都表现出强大的预测能力,并表现出高可靠性。随后,选择具有最广泛适用范围的模型来补充现有的实验数据,从而增强我们对PFAS行为的理解。
    In this study, we focus on the development of Quantitative Structure-Property Relationship (QSPR) models to predict the critical micelle concentration (CMC) for per- and polyfluoroalkyl substances (PFASs). Experimental CMC values for both fluorinated and non-fluorinated compounds were meticulously compiled from existing literature sources. Our approach involved constructing two distinct types of models based on Support Vector Machine (SVM) algorithms applied to the dataset. Type (I) models were trained exclusively on CMC values for fluorinated compounds, while Type (II) models were developed utilizing the entire dataset, incorporating both fluorinated and non-fluorinated compounds. Comparative analyses were conducted against reference data, as well as between the two model types. Encouragingly, both types of models exhibited robust predictive capabilities and demonstrated high reliability. Subsequently, the model having the broadest applicability domain was selected to complement the existing experimental data, thereby enhancing our understanding of PFAS behaviour.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:兽用抗生素是用于杀死或抑制与动物疾病相关的病原菌生长的化学化合物。这些分子可以通过液相色谱-质谱(LC-MS)中的保留时间(tR)来定义。预测新兽用抗生素tR的一种策略是开发预测性定量结构-性质关系(QSPR)。在这项研究中使用。
    结果:选择了122种抗生素的数据库,其中使用HypersilGold柱测量了tR。通过集成无监督变量减少,建立了最优的三特征模型,替换方法变量子集选择和多元线性回归。训练集(R2=0.902和RMSEC=0.871)和测试集(Q2=0.854和RMSEP=1.064)的确定系数和均方根误差之间的可忽略差异表明了稳定和预测模型。在进一步的步骤中,提供了对每个描述符在预测tR中的作用机制的深入解释,以及构建准确预测新抗生素的理论化学空间。
    结论:这项工作中开发的计算机模型确定了与水溶性相关的三种分子描述符,辛醇-水分配系数以及负和亲脂原子对的存在。这里开发的QSPR可以由农业和食品化学家实施,以在LC-MS框架内识别和监测现有和新的抗生素。计算模型是根据经济合作与发展组织概述的五项原则开发的。本文受版权保护。保留所有权利。
    BACKGROUND: Veterinary antibiotics are chemical compounds used to kill or inhibit the growth of pathogenic bacteria associated with animal diseases. These molecules can be defined by their retention times (tR) in liquid chromatography-mass spectrometry (LC-MS). One strategy to predict the tR of new veterinary antibiotics is the development of predictive quantitative structure-property relationships (QSPRs), which were used in this study.
    RESULTS: A database of 122 antibiotics was selected in which the tR was measured using a Hypersil GOLD column. An optimal three-feature model was developed by integrating the unsupervised variable reduction, replacement method variable subset selection, and multiple linear regression. The negligible differences among the coefficient of determination and the root-mean-square error for the training set (R2 = 0.902 and RMSEC = 0.871) and test set (Q2 = 0.854 and RMSEP = 1.064) indicate a stable and predictive model. In a further step, a more in-depth explanation of the mechanism of action of each descriptor in predicting the tR is provided, with the construction of the theoretical chemical space for accurate predictions of new antibiotics.
    CONCLUSIONS: The in silico model developed in this work identified three molecular descriptors associated with aqueous solubility, octanol-water partition coefficient, and the presence of negative and lipophilic atom pairs. The QSPR developed here could be implemented by agricultural and food chemists to identify and monitor existing and new antibiotics within the framework of LC-MS. The computational model was developed in accordance with five principles outlined by the Organization for Economic Co-operation and Development. © 2024 Society of Chemical Industry.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    准确预测植物角质层-空气分配系数(Kca)对于评估有机污染物的生态风险和阐明其分配机制至关重要。当前的工作收集了来自25种植物物种和106种化合物(数据集(I))的255个测量的Kca值,并且将它们平均以建立包含106种化合物的Kca值的数据集(数据集(II))。机器学习算法(多元线性回归(MLR),多层感知器(MLP),k-最近邻(KNN),和梯度提升决策树(GBDT))用于开发八个QSPR模型来预测Kca。结果表明,所开发的模型具有很高的拟合优度,以及良好的鲁棒性和预测性能。推荐GBDT-2模型(Radj2=0.925,QLOO2=0.756,QBOOT2=0.864,Rext2=0.837,Qext2=0.811和CCC=0.891)作为预测Kca的最佳模型。此外,基于Shapley加法解释(SHAP)方法解释GBDT-1和GBDT-2模型,如分子大小,极化率,和分子复杂性,影响植物角质层吸附空气中有机污染物的能力。所开发模型的令人满意的性能表明,它们在指导有机污染物的环境归宿和促进生态友好和可持续化学工程的进展方面具有广泛的应用潜力。
    Accurately predicting plant cuticle-air partition coefficients (Kca) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured Kca values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing Kca values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting Kca. The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (Radj2 = 0.925, QLOO2 = 0.756, QBOOT2 = 0.864, Rext2 = 0.837, Qext2 = 0.811, and CCC = 0.891) is recommended as the best model for predicting Kca due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号