KNIME workflow

  • 文章类型: Journal Article
    膜通透性是代表化合物的表观通透性(Papp)的体外参数,是一个关键的吸收,分布,新陈代谢,药物开发中的排泄参数。尽管Caco-2细胞系是测量Papp最常用的细胞系,其他细胞系,例如Madin-Darby犬肾(MDCK),LLC-猪肾1(LLC-PK1),和RalphRuss犬肾(RRCK)细胞系,也可以用来估计Papp。因此,使用MDCK构建Papp估计的仿真模型,LLC-PK1和RRCK细胞系需要收集大量的体外Papp数据。一个开放的数据库提供了各种化合物的广泛测量,涵盖了广阔的化学空间;然而,在没有进行适当的准确性和质量检查的情况下,报告了对使用公开数据库中发布的数据的担忧。确保用于计算机模型训练的数据集的质量至关重要,因为人工智能(AI,包括深度学习)用于开发模型来预测各种药代动力学特性,和数据质量影响这些模型的性能。因此,必须对收集到的数据进行仔细的管理。在这里,我们开发了一个新的工作流程,支持MDCK中测量的Papp数据的自动管理,使用KNIME从ChEMBL收集的LLC-PK1和RRCK细胞系。工作流程由四个主要阶段组成。从ChEMBL中提取数据并过滤以鉴定目标方案。在检查436篇文章后,总共保留了1661个高质量条目。工作流程免费提供,可以更新,并具有较高的可重用性。我们的研究为数据质量分析提供了一种新颖的方法,并加速了有效药物发现的有用计算机模型的开发。科学贡献:通过自动收集可靠的测量数据,可以显着降低构建高度准确的预测模型的成本。我们的工具减少了数据收集所需的时间和精力,并使研究人员能够专注于为其他类型的分析构建高性能的计算机模型。据我们所知,文献中没有这样的工具。
    Membrane permeability is an in vitro parameter that represents the apparent permeability (Papp) of a compound, and is a key absorption, distribution, metabolism, and excretion parameter in drug development. Although the Caco-2 cell lines are the most used cell lines to measure Papp, other cell lines, such as the Madin-Darby Canine Kidney (MDCK), LLC-Pig Kidney 1 (LLC-PK1), and Ralph Russ Canine Kidney (RRCK) cell lines, can also be used to estimate Papp. Therefore, constructing in silico models for Papp estimation using the MDCK, LLC-PK1, and RRCK cell lines requires collecting extensive amounts of in vitro Papp data. An open database offers extensive measurements of various compounds covering a vast chemical space; however, concerns were reported on the use of data published in open databases without the appropriate accuracy and quality checks. Ensuring the quality of datasets for training in silico models is critical because artificial intelligence (AI, including deep learning) was used to develop models to predict various pharmacokinetic properties, and data quality affects the performance of these models. Hence, careful curation of the collected data is imperative. Herein, we developed a new workflow that supports automatic curation of Papp data measured in the MDCK, LLC-PK1, and RRCK cell lines collected from ChEMBL using KNIME. The workflow consisted of four main phases. Data were extracted from ChEMBL and filtered to identify the target protocols. A total of 1661 high-quality entries were retained after checking 436 articles. The workflow is freely available, can be updated, and has high reusability. Our study provides a novel approach for data quality analysis and accelerates the development of helpful in silico models for effective drug discovery. Scientific Contribution: The cost of building highly accurate predictive models can be significantly reduced by automating the collection of reliable measurement data. Our tool reduces the time and effort required for data collection and will enable researchers to focus on constructing high-performance in silico models for other types of analysis. To the best of our knowledge, no such tool is available in the literature.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Biomedical information mining is increasingly recognized as a promising technique to accelerate drug discovery and development. Especially, integrative approaches which mine data from several (open) data sources have become more attractive with the increasing possibilities to programmatically access data through Application Programming Interfaces (APIs). The use of open data in conjunction with free, platform-independent analytic tools provides the additional advantage of flexibility, re-usability, and transparency. Here, we present a strategy for performing ligand-based in silico drug repurposing with the analytics platform KNIME. We demonstrate the usefulness of the developed workflow on the basis of two different use cases: a rare disease (here: Glucose Transporter Type 1 (GLUT-1) deficiency), and a new disease (here: COVID 19). The workflow includes a targeted download of data through web services, data curation, detection of enriched structural patterns, as well as substructure searches in DrugBank and a recently deposited data set of antiviral drugs provided by Chemical Abstracts Service. Developed workflows, tutorials with detailed step-by-step instructions, and the information gained by the analysis of data for GLUT-1 deficiency syndrome and COVID-19 are made freely available to the scientific community. The provided framework can be reused by researchers for other in silico drug repurposing projects, and it should serve as a valuable teaching resource for conveying integrative data mining strategies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    食物中存在的化合物的味道刺激我们摄取营养并避免有毒物质。然而,味觉的感知很大程度上取决于遗传和进化的观点。这项工作的目的是开发和验证基于分子指纹的机器学习模型,以区分分子的甜味和苦味。BitterSweetForest是第一个基于KNIME工作流的开放访问模型,它提供了使用分子指纹和基于随机森林的分类器来预测化合物的苦味和甜味的平台。构建的模型在交叉验证中产生95%的准确度和0.98的AUC。在独立测试集中,BitterSweetForest对苦味和甜味预测的准确度为96%,AUC为0.98。将所构建的模型进一步应用于天然化合物的苦味和甜味的预测,批准的药物以及急性毒性化合物数据集。BitterSweetForest建议70%的天然产品空间,苦味和10%的天然产品空间一样甜,信心评分为0.60及以上。经批准的药物组的77%被预测为苦味,2%被预测为甜味,置信度评分为0.75及以上。同样,急性口服毒性类别的总化合物的75%仅被预测为苦味,最低置信度评分为0.75,这表明有毒化合物大多是苦味的。此外,我们应用了一种基于贝叶斯的特征分析方法,使用圆形指纹的特征空间来区分甜味和苦味化合物之间最常见的化学特征。
    Taste of a chemical compound present in food stimulates us to take in nutrients and avoid poisons. However, the perception of taste greatly depends on the genetic as well as evolutionary perspectives. The aim of this work was the development and validation of a machine learning model based on molecular fingerprints to discriminate between sweet and bitter taste of molecules. BitterSweetForest is the first open access model based on KNIME workflow that provides platform for prediction of bitter and sweet taste of chemical compounds using molecular fingerprints and Random Forest based classifier. The constructed model yielded an accuracy of 95% and an AUC of 0.98 in cross-validation. In independent test set, BitterSweetForest achieved an accuracy of 96% and an AUC of 0.98 for bitter and sweet taste prediction. The constructed model was further applied to predict the bitter and sweet taste of natural compounds, approved drugs as well as on an acute toxicity compound data set. BitterSweetForest suggests 70% of the natural product space, as bitter and 10% of the natural product space as sweet with confidence score of 0.60 and above. 77% of the approved drug set was predicted as bitter and 2% as sweet with a confidence score of 0.75 and above. Similarly, 75% of the total compounds from acute oral toxicity class were predicted only as bitter with a minimum confidence score of 0.75, revealing toxic compounds are mostly bitter. Furthermore, we applied a Bayesian based feature analysis method to discriminate the most occurring chemical features between sweet and bitter compounds using the feature space of a circular fingerprint.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Advances in the drug discovery research substantially depend on in silico methods and techniques that capitalize on experimental data to enable the accurate property/activity assessment by employing a variety of computational techniques. These in silico tools can significantly reduce expensive and time consuming experimental procedures required and are strongly recommended to avoid animal testing, especially as far as toxicity evaluation and risk assessment is concerned. In this context, in the present work we aim to develop a predictive model for the cytotoxic effects of a wide range of compounds based solely on calculated molecular descriptors that account for their topological, geometric and structural characteristics. The developed model was fully validated and was released online via Enalos Cloud platform accessible through http://enalos.insilicotox.com/MouseTox/. This ready-to-use web service offers, through a user-friendly interface, free access to the model results and therefore can act as a toxicity prediction tool for the risk assessment of novel compounds, without any special requirements or prior programming skills.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    基于生理的动力学(PBK)模型和基于虚拟细胞的测定可以连接以形成所谓的基于生理的动态(PBD)模型。这项研究说明了PBK模型的开发和应用,该模型用于预测人类雌粒诱导的DNA加合物形成和肝毒性。为了解决肝毒性,HepaRG细胞被用作肝细胞的替代品,细胞活力被用作体外毒理学终点。关于DNA加合物形成的信息取自文献。由于estragole诱导的细胞损伤不是由母体化合物直接引起的,但是通过一种反应性代谢物,有关代谢途径的信息被纳入模型.此外,通过将PBK/D模型实施到KNIME工作流程中,开发了一个用户友好的工具。该工作流程可用于执行体外至体内外推和前向后剂量测定以支持化学风险评估。
    Physiologically based kinetic (PBK) models and the virtual cell based assay can be linked to form so called physiologically based dynamic (PBD) models. This study illustrates the development and application of a PBK model for prediction of estragole-induced DNA adduct formation and hepatotoxicity in humans. To address the hepatotoxicity, HepaRG cells were used as a surrogate for liver cells, with cell viability being used as the in vitro toxicological endpoint. Information on DNA adduct formation was taken from the literature. Since estragole induced cell damage is not directly caused by the parent compound, but by a reactive metabolite, information on the metabolic pathway was incorporated into the model. In addition, a user-friendly tool was developed by implementing the PBK/D model into a KNIME workflow. This workflow can be used to perform in vitro to in vivo extrapolation and forward as backward dosimetry in support of chemical risk assessment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Rapid safety assessment is more and more needed for the increasing chemicals both in chemical industries and regulators around the world. The traditional experimental methods couldn\'t meet the current demand any more. With the development of the information technology and the growth of experimental data, in silico modeling has become a practical and rapid alternative for the assessment of chemical properties, especially for the toxicity prediction of organic chemicals. In this study, a quantitative regression workflow was built by KNIME to predict chemical properties. With this regression workflow, quantitative values of chemical properties can be obtained, which is different from the binary-classification model or multi-classification models that can only give qualitative results. To illustrate the usage of the workflow, two predictive models were constructed based on datasets of Tetrahymena pyriformis toxicity and Aqueous solubility. The qcv (2) and qtest (2) of 5-fold cross validation and external validation for both types of models were greater than 0.7, which implies that our models are robust and reliable, and the workflow is very convenient and efficient in prediction of various chemical properties.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号