graph neural networks

图神经网络
  • 文章类型: Journal Article
    我们提出了一种全空间逆材料设计(FSIMD)方法,该方法完全自动化了材料设计的目标物理性质,而无需提供原子组成。化学计量学,和晶体结构提前。这里,我们使用密度泛函理论参考数据来训练通用机器学习势(UPot)和迁移学习来训练通用体积模量模型(UBmod)。UPot和UBmod都能够覆盖由42个元素中的任何元素组成的材料系统。与优化算法和增强采样接口,FSIMD方法用于找到具有最大内聚能和最大体积模量的材料,分别。发现NaCl型ZrC是具有最大内聚能的材料。对于体积模量,钻石被认定具有最大的价值。FSIMD方法也适用于设计具有其他多目标属性的材料,其精度主要受数量限制,可靠性,以及训练数据的多样性。FSIMD方法为实际应用中具有其他功能特性的逆材料设计提供了新的途径。
    We present a full space inverse materials design (FSIMD) approach that fully automates the materials design for target physical properties without the need to provide the atomic composition, chemical stoichiometry, and crystal structure in advance. Here, we used density functional theory reference data to train a universal machine learning potential (UPot) and transfer learning to train a universal bulk modulus model (UBmod). Both UPot and UBmod were able to cover materials systems composed of any element among 42 elements. Interfaced with optimization algorithm and enhanced sampling, the FSIMD approach is applied to find the materials with the largest cohesive energy and the largest bulk modulus, respectively. NaCl-type ZrC was found to be the material with the largest cohesive energy. For bulk modulus, diamond was identified to have the largest value. The FSIMD approach is also applied to design materials with other multi-objective properties with accuracy limited principally by the amount, reliability, and diversity of the training data. The FSIMD approach provides a new way for inverse materials design with other functional properties for practical applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    我们提出了一种预测人体药代动力学(PK)的新计算方法,该方法解决了早期药物设计的挑战。我们的研究介绍并描述了11个临床PK终点的大规模数据集,包含2700多个独特的化学结构来训练机器学习模型。为此,比较了多种高级培训策略,包括体外数据的整合和一个新的自我监督预训练任务。除了预测,我们的最终模型为每个数据点提供了有意义的认知不确定性.这使我们能够成功地识别出具有出色预测性能的区域,多个终点的绝对平均折叠误差(AAFE/几何平均折叠误差)小于2.5。一起,这些进步代表了朝着可操作的PK预测的重大飞跃,可以在药物设计过程的早期使用,以加快开发并减少对非临床研究的依赖。
    We present a novel computational approach for predicting human pharmacokinetics (PK) that addresses the challenges of early stage drug design. Our study introduces and describes a large-scale data set of 11 clinical PK end points, encompassing over 2700 unique chemical structures to train machine learning models. To that end multiple advanced training strategies are compared, including the integration of in vitro data and a novel self-supervised pretraining task. In addition to the predictions, our final model provides meaningful epistemic uncertainties for every data point. This allows us to successfully identify regions of exceptional predictive performance, with an absolute average fold error (AAFE/geometric mean fold error) of less than 2.5 across multiple end points. Together, these advancements represent a significant leap toward actionable PK predictions, which can be utilized early on in the drug design process to expedite development and reduce reliance on nonclinical studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    鉴定酶的催化区域选择性仍然是一个挑战。与实验性的试错方法相比,分子动力学模拟等计算方法为酶特性提供了有价值的见解。然而,这些模拟产生的大量数据阻碍了没有足够的建模技术的酶催化机理知识的提取。这里,我们提出了一个计算框架,利用基于图的主动学习从分子动力学来识别人参皂苷水解酶(GHs)的区域选择性,它选择性地催化C6或C20位置,从人参植物中获得稀有的去糖基化生物活性化合物。实验结果表明,即使不同的酶-底物系统表现出相似的动态行为,动态感知图模型也能很好地区分GH区域选择性,准确率高达96-98%。主动学习策略使我们的模型能够稳健地工作,同时减少对动态数据的依赖,表明它有能力从短的多副本模拟中挖掘足够的知识。此外,该模型的可解释性确定了与区域选择性相关的关键残基和特征。我们的发现有助于理解GH催化机理,并为合理设计以提高区域选择性提供直接帮助。我们提出了从模拟数据中模拟酶催化特异性的通用计算框架,为酶优化和设计中实验和计算方法的进一步整合铺平了道路。
    Identifying the catalytic regioselectivity of enzymes remains a challenge. Compared to experimental trial-and-error approaches, computational methods like molecular dynamics simulations provide valuable insights into enzyme characteristics. However, the massive data generated by these simulations hinder the extraction of knowledge about enzyme catalytic mechanisms without adequate modeling techniques. Here, we propose a computational framework utilizing graph-based active learning from molecular dynamics to identify the regioselectivity of ginsenoside hydrolases (GHs), which selectively catalyze C6 or C20 positions to obtain rare deglycosylated bioactive compounds from Panax plants. Experimental results reveal that the dynamic-aware graph model can excellently distinguish GH regioselectivity with accuracy as high as 96-98% even when different enzyme-substrate systems exhibit similar dynamic behaviors. The active learning strategy equips our model to work robustly while reducing the reliance on dynamic data, indicating its capacity to mine sufficient knowledge from short multi-replica simulations. Moreover, the model\'s interpretability identified crucial residues and features associated with regioselectivity. Our findings contribute to the understanding of GH catalytic mechanisms and provide direct assistance for rational design to improve regioselectivity. We presented a general computational framework for modeling enzyme catalytic specificity from simulation data, paving the way for further integration of experimental and computational approaches in enzyme optimization and design.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    癌症研究涵盖了各种规模的数据,模态,和决议,从筛查和诊断成像到数字化组织病理学幻灯片,再到各种类型的分子数据和临床记录。将这些不同的数据类型集成到个性化癌症护理和预测建模中,有望提高癌症筛查的准确性和可靠性。诊断,和治疗。传统的分析方法,通常专注于孤立或单峰信息,未能捕捉到癌症数据的复杂性和异质性。深度神经网络的出现刺激了能够从不同来源提取和合成信息的复杂多模态数据融合技术的发展。其中,图神经网络(GNN)和变形金刚已经成为多模态学习的强大工具,展示显著的成功。这篇综述介绍了多模式学习的基本原理,包括肿瘤学数据模式,多模态学习的分类法,和融合策略。我们深入研究了GNN和Transformers在肿瘤学中多模态数据融合方面的最新进展,聚焦关键研究及其关键发现。我们讨论了多模态学习的独特挑战,例如数据异质性和集成复杂性,除了它提供的机会,对癌症有更细致和全面的了解。最后,我们提供了一些最新的综合多模式泛癌症数据来源。通过调查肿瘤学中多模态数据集成的情况,我们的目标是强调多模态GNN和变形金刚的变革潜力。通过本综述中提出的技术进步和方法创新,我们的目标是为这个有前途的领域的未来研究绘制一条路线。这篇综述可能是第一个突出使用GNN和变压器在癌症中的多模态建模应用现状的综述,提供全面的多模式肿瘤学数据源,并为多模态进化奠定了基础,鼓励在个性化癌症护理方面进一步探索和发展。
    Cancer research encompasses data across various scales, modalities, and resolutions, from screening and diagnostic imaging to digitized histopathology slides to various types of molecular data and clinical records. The integration of these diverse data types for personalized cancer care and predictive modeling holds the promise of enhancing the accuracy and reliability of cancer screening, diagnosis, and treatment. Traditional analytical methods, which often focus on isolated or unimodal information, fall short of capturing the complex and heterogeneous nature of cancer data. The advent of deep neural networks has spurred the development of sophisticated multimodal data fusion techniques capable of extracting and synthesizing information from disparate sources. Among these, Graph Neural Networks (GNNs) and Transformers have emerged as powerful tools for multimodal learning, demonstrating significant success. This review presents the foundational principles of multimodal learning including oncology data modalities, taxonomy of multimodal learning, and fusion strategies. We delve into the recent advancements in GNNs and Transformers for the fusion of multimodal data in oncology, spotlighting key studies and their pivotal findings. We discuss the unique challenges of multimodal learning, such as data heterogeneity and integration complexities, alongside the opportunities it presents for a more nuanced and comprehensive understanding of cancer. Finally, we present some of the latest comprehensive multimodal pan-cancer data sources. By surveying the landscape of multimodal data integration in oncology, our goal is to underline the transformative potential of multimodal GNNs and Transformers. Through technological advancements and the methodological innovations presented in this review, we aim to chart a course for future research in this promising field. This review may be the first that highlights the current state of multimodal modeling applications in cancer using GNNs and transformers, presents comprehensive multimodal oncology data sources, and sets the stage for multimodal evolution, encouraging further exploration and development in personalized cancer care.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近的技术进步已经实现了空间分辨的转录组谱分析,但在多细胞分辨率下更具成本效益。细胞类型反卷积的任务已被引入以从此类多细胞斑点中解开离散细胞类型。然而,细胞类型反卷积的现有基准数据集要么是通过模拟生成的,要么是规模有限的,主要包括小鼠的数据,不是为人类免疫肿瘤学设计的。为了克服这些限制并促进人类免疫肿瘤学的细胞类型去卷积的全面研究,我们引入了一个名为SpatialCTD的大规模空间转录组反卷积基准数据集,包含180万个细胞和12900个来自人类肿瘤微环境的假点,肾,还有肝脏.此外,对于大多数基于参考的去卷积方法,SpatialCTD提供了比从单细胞RNA测序(scRNA-seq)数据生成的更现实的参考。为了利用位置感知空间CTD参考,我们提出了一种基于图神经网络的反卷积方法(即,GNNDeconvolver)。大量的实验表明,GNNDeconvolver通常比现有的最先进的方法表现得更好,无需scRNA-seq数据。为了从灵活的协议中全面评估空间转录组学数据,我们提供了一个在线工具,能够转换来自各种平台的空间转录组数据(例如,10×铯,MERFISH,和sci-Space)成伪点,具有可调光斑尺寸。SpatialCTD数据集和GNNDeconvolver实现可在https://github.com/OmicsML/SpatialCTD获得,和在线转换器工具可以访问https://omicsml。github.io/SpatialCTD/.
    Recent technological advancements have enabled spatially resolved transcriptomic profiling but at a multicellular resolution that is more cost-effective. The task of cell type deconvolution has been introduced to disentangle discrete cell types from such multicellular spots. However, existing benchmark datasets for cell type deconvolution are either generated from simulation or limited in scale, predominantly encompassing data on mice and are not designed for human immuno-oncology. To overcome these limitations and promote comprehensive investigation of cell type deconvolution for human immuno-oncology, we introduce a large-scale spatial transcriptomic deconvolution benchmark dataset named SpatialCTD, encompassing 1.8 million cells and 12,900 pseudo spots from the human tumor microenvironment across the lung, kidney, and liver. In addition, SpatialCTD provides more realistic reference than those generated from single-cell RNA sequencing (scRNA-seq) data for most reference-based deconvolution methods. To utilize the location-aware SpatialCTD reference, we propose a graph neural network-based deconvolution method (i.e., GNNDeconvolver). Extensive experiments show that GNNDeconvolver often outperforms existing state-of-the-art methods by a substantial margin, without requiring scRNA-seq data. To enable comprehensive evaluations of spatial transcriptomics data from flexible protocols, we provide an online tool capable of converting spatial transcriptomic data from various platforms (e.g., 10× Visium, MERFISH, and sci-Space) into pseudo spots, featuring adjustable spot size. The SpatialCTD dataset and GNNDeconvolver implementation are available at https://github.com/OmicsML/SpatialCTD, and the online converter tool can be accessed at https://omicsml.github.io/SpatialCTD/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    我们提出了一种新颖的基于图的方法,用于标记给定气道树分割的解剖分支。所提出的方法将气道标记制定为气道树图中的分支分类问题,其中使用卷积神经网络提取分支特征,并使用图神经网络进行丰富。我们的图神经网络通过使每个节点从其本地邻居聚合信息来实现结构感知,并通过编码图中的节点位置来实现位置感知。我们对来自慢性阻塞性肺疾病(COPD)不同严重程度的受试者的220个气道树进行了评估。结果表明,我们的方法在计算上是有效的,并且比基线方法显着提高了分支分类性能。我们的方法标记18个节段气道分支的总体平均准确率达到91.18%,与标准CNN方法获得的83.83%和现有方法获得的87.37%相比。此外,对另外一组40名受试者进行的读者研究表明,我们的算法在标记节段气道方面的性能与人类专家相当。我们在https://github.com/DIAGNijmegen/spgnn上发布了源代码。所提出的算法也可在https://grand-challenge.org/algorithments/airway-anatician-labeling/上公开获得。
    We present a novel graph-based approach for labeling the anatomical branches of a given airway tree segmentation. The proposed method formulates airway labeling as a branch classification problem in the airway tree graph, where branch features are extracted using convolutional neural networks and enriched using graph neural networks. Our graph neural network is structure-aware by having each node aggregate information from its local neighbors and position-aware by encoding node positions in the graph. We evaluated the proposed method on 220 airway trees from subjects with various severity stages of Chronic Obstructive Pulmonary Disease (COPD). The results demonstrate that our approach is computationally efficient and significantly improves branch classification performance than the baseline method. The overall average accuracy of our method reaches 91.18% for labeling 18 segmental airway branches, compared to 83.83% obtained by the standard CNN method and 87.37% obtained by the existing method. Furthermore, the reader study done on an additional set of 40 subjects shows that our algorithm performs comparably to human experts in labeling segmental-airways. We published our source code at https://github.com/DIAGNijmegen/spgnn. The proposed algorithm is also publicly available at https://grand-challenge.org/algorithms/airway-anatomical-labeling/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    虽然图神经网络(GNN)已经证明了它们在处理非欧几里得结构化数据方面的有效性,GNN的邻域提取是耗时且计算密集的,使得它们难以在低延迟工业应用中部署。为了解决这个问题,一个可行的解决方案是图形知识蒸馏(KD),它可以学习高性能学生多层感知器(MLP),通过模仿教师GNN的卓越输出来代替GNN。然而,最先进的图形知识蒸馏方法主要基于从中间隐藏层中提取深层特征,这导致logit层蒸馏的重要性被大大忽略。为研究基于logits的KD方法提供一个新的观点,我们将解耦的思想引入到图的知识蒸馏中。具体来说,我们首先将经典的图形知识蒸馏损失重新表述为两部分,即,目标类图蒸馏(TCGD)损失和非目标类图蒸馏(NCGD)损失。接下来,我们解耦了GNN的预测置信度和NCGD损失之间的负相关,以及消除TCGD和NCGD之间的固定重量。我们将这种基于logits的方法命名为解耦图知识蒸馏(DGKD)。它可以针对不同的数据样本灵活调整TCGD和NCGD的权重,从而提高学生MLP的预测精度。在公共基准数据集上进行的大量实验表明了我们方法的有效性。此外,DGKD可以作为即插即用损失函数纳入任何现有的图形知识蒸馏框架,进一步提高蒸馏性能。该代码可在https://github.com/xsk160/DGKD获得。
    While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN\'s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    药物靶标由于其在疾病发病机理中的关键作用而成为药物发现的主要焦点。由于生物分子数据集的可用性不断增加,计算方法被广泛应用于药物开发。流行的生成方法可以通过学习给定的分子分布来创建新的药物分子。然而,这些方法大多不用于靶标特异性药物的发现.我们开发了一种基于能量的概率模型,用于计算靶标特异性药物发现。结果表明,我们提出的TagMol可以产生具有与真实分子相似的结合亲和力评分的分子。与图卷积网络基线模型相比,基于GAT的模型显示出更快更好的学习能力。
    Drug targets are the main focus of drug discovery due to their key role in disease pathogenesis. Computational approaches are widely applied to drug development because of the increasing availability of biological molecular datasets. Popular generative approaches can create new drug molecules by learning the given molecule distributions. However, these approaches are mostly not for target-specific drug discovery. We developed an energy-based probabilistic model for computational target-specific drug discovery. Results show that our proposed TagMol can generate molecules with similar binding affinity scores as real molecules. GAT-based models showed faster and better learning relative to Graph Convolutional Network baseline models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生化途径由一系列相互关联的生化反应组成,以完成特定的生命活动。途径的参与反应物和所得产物,包括基因片段,蛋白质,和小分子,聚结形成复杂的反应网络。生化途径在生化领域发挥关键作用,因为它们可以揭示生物体中生化反应的流动,使它们对于理解生命过程至关重要。现有的生化途径网络研究主要基于实验和途径数据库分析方法,受到大量成本限制的困扰。受到生物医学中代表性学习方法成功的启发,我们开发了生化途径预测(BPP)平台,这是一个自动BPP平台,用于识别生化途径网络中的潜在链接或属性。我们的BPP平台融合了各种表示学习模型,包括最新的超图神经网络技术来模拟通路中的生化反应。特别是,BPP包含最新的基于生化途径的数据集,可以预测生化途径中生化反应的潜在参与者或产物。此外,BPP配备了SHAP解释器,以解释预测结果并计算每个参与元素的贡献。我们对收集的生化途径数据集进行了广泛的实验,以衡量BPP上所有可用模型的有效性。此外,我们基于数据集的时间顺序模式的详细案例研究证明了我们平台的有效性.我们的BPP门户网站,源代码和数据集可免费访问https://github.com/Glasgow-AI4BioMed/BPP。
    A biochemical pathway consists of a series of interconnected biochemical reactions to accomplish specific life activities. The participating reactants and resultant products of a pathway, including gene fragments, proteins, and small molecules, coalesce to form a complex reaction network. Biochemical pathways play a critical role in the biochemical domain as they can reveal the flow of biochemical reactions in living organisms, making them essential for understanding life processes. Existing studies of biochemical pathway networks are mainly based on experimentation and pathway database analysis methods, which are plagued by substantial cost constraints. Inspired by the success of representation learning approaches in biomedicine, we develop the biochemical pathway prediction (BPP) platform, which is an automatic BPP platform to identify potential links or attributes within biochemical pathway networks. Our BPP platform incorporates a variety of representation learning models, including the latest hypergraph neural networks technology to model biochemical reactions in pathways. In particular, BPP contains the latest biochemical pathway-based datasets and enables the prediction of potential participants or products of biochemical reactions in biochemical pathways. Additionally, BPP is equipped with an SHAP explainer to explain the predicted results and to calculate the contributions of each participating element. We conduct extensive experiments on our collected biochemical pathway dataset to benchmark the effectiveness of all models available on BPP. Furthermore, our detailed case studies based on the chronological pattern of our dataset demonstrate the effectiveness of our platform. Our BPP web portal, source code and datasets are freely accessible at https://github.com/Glasgow-AI4BioMed/BPP.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    开发新药是一项至关重要的努力,有可能改善人类健康,幸福和预期寿命。分子性质预测是药物发现的关键步骤,因为它有助于识别潜在的治疗化合物。然而,药物开发的实验方法通常是耗时且资源密集的,成功的可能性很低。为了解决这些限制,深度学习(DL)方法由于能够识别分子数据中的高区别性模式而成为可行的替代方法。特别是,图神经网络(GNN)在图结构数据上操作,以识别具有理想分子特性的有希望的候选药物。这些方法将分子表示为一组节点(原子)和边缘(化学键)特征,以聚合用于分子图表示学习的局部信息。尽管有几个GNN框架,每种方法都有自己的缺点。虽然,一些GNN可能在某些任务中表现出色,他们在其他人中可能表现不佳。在这项工作中,我们提出了一种混合方法,结合不同的基于图的方法,以结合他们的优势,减轻他们的局限性,准确地预测分子性质。所提出的方法包括多层混合GNN架构,该架构集成了多个GNN框架来计算用于分子属性预测的图嵌入。此外,我们对多个基准数据集进行了广泛的实验,以证明我们的混合方法显着优于最先进的基于图的模型。用于复制结果的数据和代码脚本在存储库中可用,https://github.com/pedro-quesado/HybridGNN.
    The development of new drugs is a vital effort that has the potential to improve human health, well-being and life expectancy. Molecular property prediction is a crucial step in drug discovery, as it helps to identify potential therapeutic compounds. However, experimental methods for drug development can often be time-consuming and resource-intensive, with a low probability of success. To address such limitations, deep learning (DL) methods have emerged as a viable alternative due to their ability to identify high-discriminating patterns in molecular data. In particular, graph neural networks (GNNs) operate on graph-structured data to identify promising drug candidates with desirable molecular properties. These methods represent molecules as a set of node (atoms) and edge (chemical bonds) features to aggregate local information for molecular graph representation learning. Despite the availability of several GNN frameworks, each approach has its own shortcomings. Although, some GNNs may excel in certain tasks, they may not perform as well in others. In this work, we propose a hybrid approach that incorporates different graph-based methods to combine their strengths and mitigate their limitations to accurately predict molecular properties. The proposed approach consists in a multi-layered hybrid GNN architecture that integrates multiple GNN frameworks to compute graph embeddings for molecular property prediction. Furthermore, we conduct extensive experiments on multiple benchmark datasets to demonstrate that our hybrid approach significantly outperforms the state-of-the-art graph-based models. The data and code scripts to reproduce the results are available in the repository, https://github.com/pedro-quesado/HybridGNN.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号