QSPR

QSPR
  • 文章类型: Journal Article
    基于递归神经网络的变分异质编码器,用分子结构的SMILES线性符号训练,用于导出以下原子描述符:从整个分子的原始SMILES和目标原子替换的同一分子的SMILES获得的δ潜在空间向量(DLSVs)。探索了不同的替代品,即,改变原子元素,替换为训练集中未使用的模型词汇表的字符,或从SMILES中去除目标原子。具有t分布随机邻居嵌入(t-SNE)的DLSV描述符的无监督映射揭示了根据原子元素的显着聚类,杂交,原子类型,和芳香性。原子DLSV描述符用于训练机器学习(ML)模型以预测19FNMR化学位移。对于具有随机森林或梯度增强回归量的1046个分子的独立测试集,获得了高达0.89的R2和高达5.5ppm的平均绝对误差。来自Transformer模型的中间表示产生了可比的结果。此外,DLSV被用作潜在空间中的分子算子:卤化(H→F取代)的DLSV被求和为4135个没有氟原子的新分子的LSV,并解码为SMILES,产生99%的有效微笑,其中75%的SMILES掺入氟和56%的结构掺入氟而没有其他结构变化。
    A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the same molecule with the target atom replaced. Different replacements were explored, namely, changing the atomic element, replacement with a character of the model vocabulary not used in the training set, or the removal of the target atom from the SMILES. Unsupervised mapping of the DLSV descriptors with t-distributed stochastic neighbor embedding (t-SNE) revealed a remarkable clustering according to the atomic element, hybridization, atomic type, and aromaticity. Atomic DLSV descriptors were used to train machine learning (ML) models to predict 19F NMR chemical shifts. An R2 of up to 0.89 and mean absolute errors of up to 5.5 ppm were obtained for an independent test set of 1046 molecules with random forests or a gradient-boosting regressor. Intermediate representations from a Transformer model yielded comparable results. Furthermore, DLSVs were applied as molecular operators in the latent space: the DLSV of a halogenation (H→F substitution) was summed to the LSVs of 4135 new molecules with no fluorine atom and decoded into SMILES, yielding 99% of valid SMILES, with 75% of the SMILES incorporating fluorine and 56% of the structures incorporating fluorine with no other structural change.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    QSPR在数学上将物理化学性质与分子结构联系起来。可以使用拓扑指数来预测化学分子的物理化学性质。这是消除昂贵和耗时的实验室测试的有效方法。我们在基于mev度和mve度的指数与苯环烃的物理性质之间建立了QSPR。为了计算这些指数,我们使用Maple软件设计了一个程序,并使用SPSS软件开发了指标与物理性质之间的相关性。我们的研究表明,基于mve度的和连通性(χmve)和原子键连通性(ABCmve)指数,基于mev度的Randić(Rmev)和萨格勒布(Mmev)指数是三个最重要的参数,对理化性质具有良好的预测能力。我们检查了Rmev预测摩尔折射率和沸点,χmve预测LogP和焓,ABCmve预测分子量,MMev预测了Gibb的能量,派电子能量和亨利定律。此外,我们计算了线性[n]-苯基的指数。
    QSPR mathematically links physicochemical properties with the structure of a molecule. The physicochemical properties of chemical molecules can be predicted using topological indices. It is an effective method for eliminating costly and time-consuming laboratory tests. We established a QSPR between mev-degree and mve-degree-based indices and the physical properties of benzenoid hydrocarbons. To compute these indices, we designed a program using Maple software and the correlation between indices and physical properties was developed using the SPSS software. Our study reveals that the mve-degree-based sum-connectivity ( χ mve ) and atom bond connectivity ( A B C mve ) index, mev-degree-based Randić ( R mev ) and Zagreb ( M mev ) index are the three most significant parameters and have good prediction ability for the physicochemical properties. We examined that R mev predicts the molar refractivity and boiling point, χ mve predicts the LogP and enthalpy, A B C mve predicts the molecular weight, M mev predicts the Gibb\'s energy, Pie-electron energy and Henry\'s law. Moreover, we computed the indices for the linear [n]-phenylen.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究调查了瓜尔胶生物分子的定量结构-性质关系(QSPR)模型,专注于他们的结构参数。瓜尔胶,一种具有不同工业应用的多糖,表现出各种性质,如粘度,溶解度,和乳化能力,受其分子结构的影响。在这项研究中,M多项式和相关的拓扑指数被用作结构描述符以表示瓜尔胶的分子结构。M多项式和相关的拓扑指数捕获了重要的结构特征,包括尺寸,形状,分支,和连通性。通过将这些描述符与瓜尔胶特性的实验数据相关联,预测模型是使用回归分析技术开发的。分析揭示了沸点和分子量与所有考虑的拓扑描述符之间的强相关性。由此产生的模型提供了对瓜尔胶结构与其性质之间关系的见解,有利于优化瓜尔胶的生产和在各行业的应用。这项研究证明了M多项式和QSPR模型在阐明瓜尔胶等复杂生物分子的结构-性质关系中的实用性,促进生物材料科学和工业应用的发展。
    This study investigates the quantitative structure-property relationship (QSPR) modeling of guar gum biomolecules, focusing on their structural parameters. Guar gum, a polysaccharide with diverse industrial applications, exhibits various properties such as viscosity, solubility, and emulsifying ability, which are influenced by its molecular structure. In this research, M -polynomial and associated topological indices are employed as structural descriptors to represent the molecular structure of guar gum. The M -polynomial and associated topological indices capture important structural features, including size, shape, branching, and connectivity. By correlating these descriptors with experimental data on guar gum properties, predictive models are developed using regression analysis techniques. The analysis revealed a strong correlation between the boiling point and molecular weight and all the considered topological descriptors. The resulting models offer insights into the relationship between guar gum structure and its properties, facilitating the optimization of guar gum production and application in various industries. This study demonstrates the utility of M -polynomial and QSPR modeling in elucidating structure-property relationships of complex biomolecules like guar gum, contributing to the advancement of biomaterial science and industrial applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究描述了六个新模型的开发和评估,用于预测与化学危害高度相关的物理化学(PC)特性。暴露,和风险估计:溶解度(在水中SW和辛醇SO),蒸气压(VP),和辛醇-水(KOW),辛醇-空气(KOA),和空气-水(KAW)分配比。这些模型在迭代片段选择定量结构-活性关系(IFSQSAR)python包中实现,版本1.1.0.这些模型被实现为多参数线性自由能关系(PPLFER)方程,该方程结合了实验校准的系统参数和用QSPR预测的溶质描述符。另外两个辅助模型已经开发和实施,用于摩尔体积(MV)的QSPR和用于化学品在室温下的物理状态的分类器。描述了IFSQSAR方法,用于表征适用性域(AD)并计算以95%预测间隔(PI)表示的不确定性估计值,并在9,000个测量的分配比和4,000个VP和SW值上进行了测试。测量数据是IFSQSAR训练和验证数据集的外部数据,用于以无偏方式评估“新型化学品”模型的预测性。从验证数据集计算出的95%PI间隔需要按1.25的因子缩放以捕获95%的外部数据。对VP和SW的预测更加不确定,主要是由于区分其物理状态的挑战(即,液体或固体)在室温下。对数KOW模型的预测精度,小说的logKAW和logKOA,数据差的化学品估计在0.7到1.4的预测均方根误差(RMSEP)范围内,对数VP和对数SW的RMSEP在1.7-1.8范围内。科学贡献新的划分模型集成了经验PPLFER方程和QSAR,允许实验数据和模型预测的无缝集成。这项工作测试了模型对不在模型训练或外部验证数据集中的新型化学物质的真实预测性。
    This study describes the development and evaluation of six new models for predicting physical-chemical (PC) properties that are highly relevant for chemical hazard, exposure, and risk estimation: solubility (in water SW and octanol SO), vapor pressure (VP), and the octanol-water (KOW), octanol-air (KOA), and air-water (KAW) partition ratios. The models are implemented in the Iterative Fragment Selection Quantitative Structure-Activity Relationship (IFSQSAR) python package, Version 1.1.0. These models are implemented as Poly-Parameter Linear Free Energy Relationship (PPLFER) equations which combine experimentally calibrated system parameters and solute descriptors predicted with QSPRs. Two other ancillary models have been developed and implemented, a QSPR for Molar Volume (MV) and a classifier for the physical state of chemicals at room temperature. The IFSQSAR methods for characterizing applicability domain (AD) and calculating uncertainty estimates expressed as 95% prediction intervals (PI) for predicted properties are described and tested on 9,000 measured partition ratios and 4,000 VP and SW values. The measured data are external to IFSQSAR training and validation datasets and are used to assess the predictivity of the models for \"novel chemicals\" in an unbiased manner. The 95% PI intervals calculated from validation datasets for partition ratios needed to be scaled by a factor of 1.25 to capture 95% of the external data. Predictions for VP and SW are more uncertain, primarily due to the challenges in differentiating their physical state (i.e., liquids or solids) at room temperature. The prediction accuracy of the models for log KOW, log KAW and log KOA of novel, data-poor chemicals is estimated to be in the range of 0.7 to 1.4 root mean squared error of prediction (RMSEP), with RMSEP in the range 1.7-1.8 for log VP and log SW. Scientific contributionNew partitioning models integrate empirical PPLFER equations and QSARs, allowing for seamless integration of experimental data and model predictions. This work tests the real predictivity of the models for novel chemicals which are not in the model training or external validation datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    具有高塞贝克系数的热电材料,高导电性,和低热导率需要直接和有效地将未使用的热量转化为电能。在这项研究中,我们构建了预测塞贝克系数的模型,电导率,和使用现有材料数据库的热导率。除了晶体中原子的比例和使用材料的温度之外,来自X射线衍射(XRD)光谱的值用作表示材料的晶体结构的输入。证实了所构建的模型可以使用X射线衍射值高精度地预测特性。此外,使用构建的模型,我们成功地提出了具有高塞贝克系数的有前途的新候选材料,高电导率,和低热导率。
    Thermoelectric materials with a high Seebeck coefficient, high electrical conductivity, and low thermal conductivity are required to directly and efficiently convert unused heat into electricity. In this study, we construct models predicting the Seebeck coefficient, electrical conductivity, and thermal conductivity using existing material databases. In addition to the ratios of atoms in the crystals and temperature at which the materials are used, the values from the X-ray diffraction (XRD) spectra were used as inputs to represent the crystal structure of the materials. It was confirmed that the constructed models could predict the properties with high accuracy using the X-ray diffraction values. Additionally, using the constructed models, we succeeded in proposing promising new candidate materials with high Seebeck coefficients, high electric conductivities, and low thermal conductivities.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本研究旨在研究蛋白激酶抑制剂通过血脑屏障(BBB)的被动扩散,并建立其渗透性预测模型。材料与方法:我们使用平行人工膜通透性测定来获得34种化合物中每种化合物的logPe值,并计算这些结构的描述符以进行定量结构-性质关系建模,创建不同的回归模型。结果:计算了所有34种化合物的logPe值。支持向量机回归被认为是最可靠的,和CATS2D_09_DA,CATS2D_04_AA,B04[N-S]和F07[C-N]描述符被确定为对被动BBB渗透性最有影响。结论:已生成的定量结构-性质关系-支持向量机回归模型可作为新类似物BBB通透性初步筛选的有效方法。
    Aim: This study aims to investigate the passive diffusion of protein kinase inhibitors through the blood-brain barrier (BBB) and to develop a model for their permeability prediction. Materials & methods: We used the parallel artificial membrane permeability assay to obtain logPe values of each of 34 compounds and calculated descriptors for these structures to perform quantitative structure-property relationship modeling, creating different regression models. Results: The logPe values have been calculated for all 34 compounds. Support vector machine regression was considered the most reliable, and CATS2D_09_DA, CATS2D_04_AA, B04[N-S] and F07[C-N] descriptors were identified as the most influential to passive BBB permeability. Conclusion: The quantitative structure-property relationship-support vector machine regression model that has been generated can serve as an efficient method for preliminary screening of BBB permeability of new analogs.
    [Box: see text].
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在对天然表面活性剂的兴趣日益增长的景观中,为特定应用选择合适的一个仍然具有挑战性。广泛的,然而往往是不系统化的,微生物表面活性剂的知识,主要由鼠李糖脂(RL)代表,通常不会翻译超出科学出版物中提出的条件。这种限制源于表征微生物表面活性剂生产的众多变量及其相互依赖性。我们假设可以从现有文献和实验数据中开发出具有目标应用特性的生物合成RL的计算配方。我们积累了有关RL生物合成和胶束增溶的文献数据,并通过我们关于甘油三酯(TG)增溶的实验结果来增强它,当前文学中代表性不足的话题。利用这些数据,我们构建了可以预测RL特性和增溶效率的数学模型,表示为logPRL=f(碳源和氮源,生物合成参数)和logMSR=f(增溶物,鼠李糖脂(如logPRL),增溶参数),分别。模特们,其特征在于分别为0.581-0.997和0.804的稳健R2值,使得能够根据描述符的重要性和对预测值的正面或负面影响对描述符进行排名。这些模型已经转化为现成的计算器,旨在简化选择过程的工具,以确定最适合预期应用的生物表面活性剂。
    In the growing landscape of interest in natural surfactants, selecting the appropriate one for specific applications remains challenging. The extensive, yet often unsystematized, knowledge of microbial surfactants, predominantly represented by rhamnolipids (RLs), typically does not translate beyond the conditions presented in scientific publications. This limitation stems from the numerous variables and their interdependencies that characterize microbial surfactant production. We hypothesized that a computational recipe for biosynthesizing RLs with targeted applicational properties could be developed from existing literature and experimental data. We amassed literature data on RL biosynthesis and micellar solubilization and augmented it with our experimental results on the solubilization of triglycerides (TGs), a topic underrepresented in current literature. Utilizing this data, we constructed mathematical models that can predict RL characteristics and solubilization efficiency, represented as logPRL = f(carbon and nitrogen source, parameters of biosynthesis) and logMSR = f(solubilizate, rhamnolipid (e.g. logPRL), parameters of solubilization), respectively. The models, characterized by robust R2 values of respectively 0.581-0.997 and 0.804, enabled the ranking of descriptors based on their significance and impact-positive or negative-on the predicted values. These models have been translated into ready-to-use calculators, tools designed to streamline the selection process for identifying a biosurfactant optimally suited for intended applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    准确预测植物角质层-空气分配系数(Kca)对于评估有机污染物的生态风险和阐明其分配机制至关重要。当前的工作收集了来自25种植物物种和106种化合物(数据集(I))的255个测量的Kca值,并且将它们平均以建立包含106种化合物的Kca值的数据集(数据集(II))。机器学习算法(多元线性回归(MLR),多层感知器(MLP),k-最近邻(KNN),和梯度提升决策树(GBDT))用于开发八个QSPR模型来预测Kca。结果表明,所开发的模型具有很高的拟合优度,以及良好的鲁棒性和预测性能。推荐GBDT-2模型(Radj2=0.925,QLOO2=0.756,QBOOT2=0.864,Rext2=0.837,Qext2=0.811和CCC=0.891)作为预测Kca的最佳模型。此外,基于Shapley加法解释(SHAP)方法解释GBDT-1和GBDT-2模型,如分子大小,极化率,和分子复杂性,影响植物角质层吸附空气中有机污染物的能力。所开发模型的令人满意的性能表明,它们在指导有机污染物的环境归宿和促进生态友好和可持续化学工程的进展方面具有广泛的应用潜力。
    Accurately predicting plant cuticle-air partition coefficients (Kca) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured Kca values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing Kca values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting Kca. The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (Radj2 = 0.925, QLOO2 = 0.756, QBOOT2 = 0.864, Rext2 = 0.837, Qext2 = 0.811, and CCC = 0.891) is recommended as the best model for predicting Kca due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    为了进行持久性特性分析和兽药对不同陆地物种的生态毒理学影响,收集了具有土壤降解特性(DT50)的不同类别的兽药(n=37),并进行了QSAR和q-RASAR模型开发。这些模型是根据经济合作组织和发展指南的2D描述符开发的,并应用了多元线性回归和遗传算法。所有开发的QSAR和q-RASAR均具有统计学意义(内部=R2adj:0.721-0.861,Q2LOO:0.609-0.757,外部=Q2Fn=0.597-0.933,MAEext=0.174-0.260)。Further,适用性域的杠杆方法保证了模型的可靠性。没有实验值的兽药根据其持久性水平进行分类。Further,使用计算机辅助技术的毒性预测和内部建立的定量结构毒性关系模型对持久性兽药进行了陆地毒性分析,以确定毒性和持久性兽药的优先级。这项研究将有助于估计现有和即将到来的兽药的持久性和毒性。
    With the aim of persistence property analysis and ecotoxicological impact of veterinary pharmaceuticals on different terrestrial species, different classes of veterinary pharmaceuticals (n = 37) with soil degradation property (DT50) were gathered and subjected to QSAR and q-RASAR model development. The models were developed from 2D descriptors under organization for economic cooperation and development guidelines with the application of multiple linear regressions along with genetic algorithm. All developed QSAR and q-RASAR were statistically significant (Internal = R2adj: 0.721-0.861, Q2LOO: 0.609-0.757, and external = Q2Fn = 0.597-0.933, MAEext = 0.174-0.260). Further, the leverage approach of applicability domain assured the model\'s reliability. The veterinary pharmaceuticals with no experimental values were classified based on their persistence level. Further, the terrestrial toxicity analysis of persistent veterinary pharmaceuticals was done using toxicity prediction by computer assisted technology and in-house built quantitative structure toxicity relationship models to prioritize the toxic and persistent veterinary pharmaceuticals. This study will be helpful in estimation of persistence and toxicity of existing and upcoming veterinary pharmaceuticals.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结核病(TB)是最具传染性的疾病之一,死亡率高于艾滋病毒/艾滋病,结核病病例担心会随着COVID-19大流行的影响而上升。制药行业一直在寻找改善药物设计过程的方法,以对抗感染的增长,并在QSPR模型的帮助下治愈新发现的综合征或基于基因的功能障碍。QSPR模型是使用结构性质建立分子结构与其物理化学属性之间的关系的数学工具。拓扑指数是在没有任何经验推导的测量的情况下从分子图生成的这样的性质。这项工作的重点是使用基于距离的拓扑指数开发QSPR模型,用于抗结核药物及其各种物理化学特征。
    Tuberculosis (TB) is one of the most contagious diseases that has a greater mortality rate than HIV/AIDS and the cases of TB are feared to rise as a repercussion of the COVID-19 pandemic. The pharmaceutical industry is constantly looking for ways to improve drug design processes in order to combat the growth of infections and cure newly identified syndromes or genetically based dysfunctions with the help of QSPR models. QSPR models are mathematical tools that establish relationships between a molecular structure and its physicochemical attributes using structural properties. Topological indices are such properties that are generated from the molecular graph without any empirically derived measurements. This work focuses on developing a QSPR model using distance-based topological indices for anti-tuberculosis medications and their diverse physicochemical features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号