Molecular descriptor

  • 文章类型: Journal Article
    The insufficient hazard thresholds of specific individual aromatic hydrocarbon compounds (AHCs) with diverse structures limit their ecological risk assessment. Thus, herein, quantitative structure-activity relationship (QSAR) models for estimating the hazard threshold of AHCs were developed based on the hazardous concentration for 5% of species (HC5) determined using the optimal species sensitivity distribution models and on the molecular descriptors calculated via the PADEL software and ORCA software. Results revealed that the optimal QSAR model, which involved eight descriptors, namely, Zagreb, GATS2m, VR3_Dzs, AATSC2s, GATS2c, ATSC2i, ω, and Vm, displayed excellent performance, as reflected by an optimal goodness of fit (R2adj = 0.918), robustness (Q2LOO = 0.869), and external prediction ability (Q2F1 = 0.760, Q2F2 = 0.782, and Q2F3 = 0.774). The hazard thresholds estimated using the optimal QSAR model were approximately close to the published water quality criteria developed by different countries and regions. The quantitative structure-toxicity relationship demonstrated that the molecular descriptors associated with electrophilicity and topological and electrotopological properties were important factors that affected the risks of AHCs. A new and reliable approach to estimate the hazard threshold of ecological risk assessment for various aromatic hydrocarbon pollutants was provided in this study, which can be widely popularised to similar contaminants with diverse structures.






  • 文章类型: Journal Article
    BACKGROUND: Peru is one of the most biodiverse countries in the world, which is reflected in its wealth of knowledge about medicinal plants. However, there is a lack of information regarding intestinal absorption and the permeability of natural products. The human colon adenocarcinoma cell line (Caco-2) is an in vitro assay used to measure apparent permeability. This study aims to develop a quantitative structure-property relationship (QSPR) model using machine learning algorithms to predict the apparent permeability of the Caco-2 cell in natural products from Peru.
    METHODS: A dataset of 1817 compounds, including experimental log Papp values and molecular descriptors, was utilized. Six QSPR models were constructed: a multiple linear regression (MLR) model, a partial least squares regression (PLS) model, a support vector machine regression (SVM) model, a random forest (RF) model, a gradient boosting machine (GBM) model, and an SVM-RF-GBM model.
    RESULTS: An evaluation of the testing set revealed that the MLR and PLS models exhibited an RMSE = 0.47 and R2 = 0.63. In contrast, the SVM, RF, and GBM models showcased an RMSE = 0.39-0.40 and R2 = 0.73-0.74. Notably, the SVM-RF-GBM model demonstrated superior performance, with an RMSE = 0.38 and R2 = 0.76. The model predicted log Papp values for 502 natural products falling within the applicability domain, with 68.9% (n = 346) showing high permeability, suggesting the potential for intestinal absorption. Additionally, we categorized the natural products into six metabolic pathways and assessed their drug-likeness.
    CONCLUSIONS: Our results provide insights into the potential intestinal absorption of natural products in Peru, thus facilitating drug development and pharmaceutical discovery efforts.






  • 文章类型: Journal Article
    Umami peptides are known for enhancing the taste experience by binding to oral umami T1R1 and T1R3 receptors. Among them, small peptides (composed of 2-4 amino acids) constitute nearly 40% of reported umami peptides. Given the diversity in amino acids and peptide sequences, umami small peptides possess tremendous untapped potential. By investigating 168,400 small peptides, we screened candidates binding to T1R1/T1R3 through molecular docking and molecular dynamics simulations, explored bonding types, amino acid characteristics, preferred binding sites, etc. Utilizing three-dimensional molecular descriptors, bonding information, and a back-propagation neural network, we developed a predictive model with 90.3% accuracy, identifying 24,539 potential umami peptides. Clustering revealed three classes with distinct logP (-2.66 ± 1.02, -3.52 ± 0.93, -2.44 ± 1.23) and asphericity (0.28 ± 0.12, 0.26 ± 0.11, 0.25 ± 0.11), indicating significant differences in shape and hydrophobicity (P < 0.05) among potential umami peptides binding to T1R1/T1R3. Following clustering, nine representative peptides (CQ, DP, NN, CSQ, DMC, TGS, DATE, HANR, and STAN) were synthesized and confirmed to possess umami taste through sensory evaluations and electronic tongue analyses. In summary, this study provides insights into exploring small peptide interactions with umami receptors, advancing umami peptide prediction models.






  • 文章类型: Journal Article
    Supramolecular chemistry is a fascinating field that explores the interactions between molecules to create higher-order structures. In the case of the supramolecular chain of Fuchsine acid, which is a type of dye molecule, several chemical applications are possible. Fuchsine acid helps to make better medicine carriers that deliver drugs where they\'re needed in the body, making treatments more effective and reducing side effects. It also helps create smart materials like sensors and self-fixing plastics, which are useful in electronics, keeping our environment clean, and making new materials. In sensing and detection, the supramolecular chain of Fuchsine acid utilizes as a sensor or detector for specific analyzes. In drug delivery, the supramolecular chains of Fuchsine acid incorporated into drug delivery systems. In recent years, a common method is linking a graph to a chemical structure and using topological descriptors to study it. This technique is becoming increasingly important over time. Topological descriptors gives very useful information while studying the topology of chemical graph. In this paper, we have computed the 3D structure of supramolecular graph of Fuchsine acid. We have computed an explicit expressions of ABC index, GA index, General Randi c ´ index, first and second Zagreb index, hyper Zagreb index, H-index and F-index of supramolecular structure of Fushine acid.






  • 文章类型: Journal Article
    In the context of carbon neutrality and carbon peaking, molecular management has become a focus of the petrochemical industry. The key to achieving molecular management is molecular reconstruction, which relies on rapid and accurate calculation of oil properties. Focusing on naphtha, we proposed a novel property prediction model construction procedure (MDs-NP) employing molecular dynamics simulations for property collections and gamma distribution from real analytical data for calculating mole fractions of simulation mixtures. We calculated 348 sets of mixture properties data in the range of 273 K-300 K by molecular dynamics simulations. Molecular feature extraction was based on molecular descriptors. In addition to descriptors based on open-source toolkits (RDKit and Mordred), we designed 12 naphtha knowledge (NK) descriptors with a focus on naphtha. Three machine learning algorithms (support vector regression, extreme gradient boosting and artificial neural network) were applied and compared to establish models for the prediction of the density and viscosity of naphtha. Mordred and NK descriptors + support vector regression algorithm achieved the best performance for density. The selected RDKFp and NK descriptors + artificial neural network algorithm achieved the best performance for viscosity. Using ablation studies, T, P_w and CC(C)C are three effective descriptors in NK that can improve the performance of the property prediction models. MDs-NP has the potential to be extended to more properties as well as more-complex petroleum systems. The models from MDs-NP can be used for rapid molecular reconstruction to facilitate construction of data-driven models and intelligent transformation of petrochemical processes.






  • 文章类型: Journal Article
    Drug-induced liver injury (DILI) is a major cause of drug development discontinuation and drug withdrawal from the market, but there are no golden standard methods for DILI risk evaluation. Since we had found the association between DILI and CYP1A1 or CYP1B1 inhibition, we further evaluated the utility of cytochrome P450 (P450) inhibition assay data for DILI risk evaluation using decision tree analysis.The inhibitory activity of drugs with DILI concern (DILI drugs) and no DILI concern (no-DILI drugs) against 10 human P450s was assessed using recombinant enzymes and luminescent substrates. The drugs were also subjected to cytotoxicity assays and high-content analysis using HepG2 cells. Molecular descriptors were calculated by alvaDesc.Decision tree analysis was performed with the data obtained as variables with or without P450-inhibitory activity to discriminate between DILI drugs and no-DILI drugs. The accuracy was significantly higher when P450-inhibitory activity was included. After the decision tree discrimination, the drugs were further discriminated with the P450-inhibitory activity. The results demonstrated that many false-positive and false-negative drugs were correctly discriminated by using the P450 inhibition data.These results suggest that P450 inhibition assay data are useful for DILI risk evaluation.






  • 文章类型: Journal Article
    In cheminformatics, molecular fingerprints (FPs) are used in various tasks such as regression and classification. However, predictive models often underutilize Morgan FP for regression and related tasks in machine learning. This study introduced descriptors derived from reshaped Morgan FPs using persistent homology for the predictive accuracy improvement. In the solvation free energy (FreeSolv) and water solubility (ESOL) datasets, persistent homology was found to enhance predictive accuracy compared to the use of only Morgan FPs. Notably, using the first-order persistence diagram (PD1) for descriptor generation resulted in more significant improvements than using the zeroth-order persistence diagram (PD0). Combining 4096 bits Morgan FPs with PD1-generated descriptors increased the average coefficient of determination in the Gaussian process regression from 0.597 to 0.667 for FreeSolv and from 0.629 to 0.654 for ESOL. Adjusting the grid size parameter during PD-based descriptor generation is crucial, as finer grids, especially with PD0, generate more descriptors but reduce predictive accuracy. Coarsening the grid or applying principal component analysis (PCA) mitigates overfitting and enhances accuracy. When descriptors were generated from Morgan FPs with randomly shuffled bit positions, coarsening the grid and/or applying PCA achieved similar accuracy improvements as when the persistent homology of the original Morgan FPs was used.






  • 文章类型: Journal Article
    We used the extreme gradient boosting (XGB) algorithm to predict the experimental solubility of chemical compounds in water and organic solvents and to select significant molecular descriptors. The accuracy of prediction of our forward stepwise top-importance XGB (FSTI-XGB) on curated solubility data sets in terms of RMSE was found to be 0.59-0.76 Log(S) for two water data sets, while for organic solvent data sets it was 0.69-0.79 Log(S) for the Methanol data set, 0.65-0.79 for the Ethanol data set, and 0.62-0.70 Log(S) for the Acetone data set. That was the first step. In the second step, we used uncurated and curated AquaSolDB data sets for applicability domain (AD) tests of Drugbank, PubChem, and COCONUT databases and determined that more than 95% of studied ca. 500,000 compounds were within the AD. In the third step, we applied conformal prediction to obtain narrow prediction intervals and we successfully validated them using test sets\' true solubility values. With prediction intervals obtained in the last fourth step, we were able to estimate individual error margins and the accuracy class of the solubility prediction for molecules within the AD of three public databases. All that was possible without the knowledge of experimental database solubilities. We find these four steps novel because usually, solubility-related works only study the first step or the first two steps.






  • 文章类型: Journal Article
    The rising demand from consumer goods and pharmaceutical industry is driving a fast expansion of newly developed chemicals. The conventional toxicity testing of unknown chemicals is expensive, time-consuming, and raises ethical concerns. The quantitative structure-property relationship (QSPR) is an efficient computational method because it saves time, resources, and animal experimentation. Advances in machine learning have improved chemical analysis in QSPR studies, but the real-world application of machine learning-based QSPR studies was limited by the unexplainable \'black box\' feature of the machine learnings. In this study, multi-encoder structure-to-toxicity (S2T)-transformer based QSPR model was developed to estimate the properties of polychlorinated biphenyls (PCBs) and endocrine disrupting chemicals (EDCs). Simplified molecular input line entry systems (SMILES) and molecular descriptors calculated by the Dragon 6 software, were simultaneously considered as input of QSPR model. Furthermore, an attention-based framework is proposed to describe the relationship between the molecular structure and toxicity of hazardous chemicals. The S2T-transformer model achieved the highest R2 scores of 0.918, 0.856, and 0.907 for logarithm of octanol-water partition coefficient (Log KOW), octanol-air partition coefficient (Log KOA), and bioconcentration factor (Log BCF) estimation of PCBs, respectively. Moreover, the attention weights were able to properly interpret the lateral (meta, para) chlorination associated with PCBs toxicity and environmental impact.






  • 文章类型: Journal Article
    Transmembrane protease serine 2 (TMPRSS2) is an important drug target due to its role in the infection mechanism of coronaviruses including SARS-CoV-2. Current understanding regarding the molecular mechanisms of known inhibitors and insights required for inhibitor design are limited. This study investigates the effect of inhibitor binding on the intramolecular backbone hydrogen bonds (BHBs) of TMPRSS2 using the concept of hydrogen bond wrapping, which is the phenomenon of stabilization of a hydrogen bond in a solvent environment as a result of being surrounded by non-polar groups. A molecular descriptor which quantifies the extent of wrapping around BHBs is introduced for this. First, virtual screening for TMPRSS2 inhibitors is performed by molecular docking using the program DOCK 6 with a Generalized Born surface area (GBSA) scoring function. The docking results are then analyzed using this descriptor and its relationship to the solvent-accessible surface area term ΔGsa of the GBSA score is demonstrated with machine learning regression and principal component analysis. The effect of binding of the inhibitors camostat, nafamostat, and 4-guanidinobenzoic acid (GBA) on the wrapping of important BHBs in TMPRSS2 is also studied using molecular dynamics. For BHBs with a large increase in wrapping groups due to these inhibitors, the radial distribution function of water revealed that certain residues involved in these BHBs, like Gln438, Asp440, and Ser441, undergo preferential desolvation. The findings offer valuable insights into the mechanisms of these inhibitors and may prove useful in the design of new inhibitors.





