Computational biology

计算生物学
  • 文章类型: Journal Article
    由主动延长的肌肉产生的力取决于不同的延长尺度上的不同结构。对于小扰动,肌肉的主动反应被线性时不变(LTI)系统很好地捕获:刚性弹簧与轻型阻尼器平行。肌肉对较长拉伸的力响应更好地由柔性弹簧表示,该弹簧在激活时可以固定其端部。实验工作表明,肌肉响应小扰动的刚度和阻尼(阻抗)对运动学习和机械稳定性至关重要。而在长期活动拉伸过程中产生的巨大力量对于模拟和预测损伤至关重要。运动学习和伤害之外,作为几乎所有陆地运动的一部分,肌肉被积极地延长。尽管阻抗和主动加长在功能上很重要,没有单一的肌肉模型具有所有这些机械特性。在这项工作中,我们提出了粘弹性跨桥活性肌动蛋白(VEXAT)模型,该模型可以复制肌肉对长度大小变化的反应。要评估VEXAT模型,我们通过模拟测量肌肉阻抗的实验来比较它对生物肌肉的反应,以及在长期活动伸展过程中产生的力量。此外,我们还比较了VEXAT模型与流行的Hill型肌肉模型的反应.VEXAT模型比Hill型模型更准确地捕获生物肌肉的阻抗及其对长活动拉伸的响应,并且仍然可以再现肌肉的力-速度和力-长度关系。虽然VEXAT模型和生物肌肉之间的比较是有利的,有一些现象可以改进:模型的低频相位响应,和支持被动力增强的机制。
    The force developed by actively lengthened muscle depends on different structures across different scales of lengthening. For small perturbations, the active response of muscle is well captured by a linear-time-invariant (LTI) system: a stiff spring in parallel with a light damper. The force response of muscle to longer stretches is better represented by a compliant spring that can fix its end when activated. Experimental work has shown that the stiffness and damping (impedance) of muscle in response to small perturbations is of fundamental importance to motor learning and mechanical stability, while the huge forces developed during long active stretches are critical for simulating and predicting injury. Outside of motor learning and injury, muscle is actively lengthened as a part of nearly all terrestrial locomotion. Despite the functional importance of impedance and active lengthening, no single muscle model has all these mechanical properties. In this work, we present the viscoelastic-crossbridge active-titin (VEXAT) model that can replicate the response of muscle to length changes great and small. To evaluate the VEXAT model, we compare its response to biological muscle by simulating experiments that measure the impedance of muscle, and the forces developed during long active stretches. In addition, we have also compared the responses of the VEXAT model to a popular Hill-type muscle model. The VEXAT model more accurately captures the impedance of biological muscle and its responses to long active stretches than a Hill-type model and can still reproduce the force-velocity and force-length relations of muscle. While the comparison between the VEXAT model and biological muscle is favorable, there are some phenomena that can be improved: the low frequency phase response of the model, and a mechanism to support passive force enhancement.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    大多数预测驱动突变的计算方法都是使用阳性样本进行训练的,而阴性样本通常来自统计方法或推定样本。这些阴性样品在捕获乘客突变多样性方面的代表性仍有待确定。为了解决这些问题,我们收集了一个平衡的数据集,其中包括来自COSMIC数据库的驱动突变和来自癌症乘客突变数据库的高质量乘客突变.随后,我们编码了这些突变的独特特征。利用特征相关性分析,我们通过集成学习技术XGBoost,利用特征选择,开发了一种名为CDMPred的癌症驱动错觉突变预测因子.提出的CDMPred方法,利用前10个功能和XGBoost,在训练和独立测试集上,接收器工作特性曲线(AUC)下的面积为0.83和0.80,分别。此外,CDMPred与现有的针对癌症特异性和一般疾病的最先进方法相比表现出卓越的性能,通过AUC和精确召回曲线下的面积来衡量。在训练数据中包括高质量的乘客突变证明对CDMPred的预测性能有利。我们预计CDMPred将是预测癌症驱动突变的有价值的工具,加深我们对个性化治疗的理解。
    Most computational methods for predicting driver mutations have been trained using positive samples, while negative samples are typically derived from statistical methods or putative samples. The representativeness of these negative samples in capturing the diversity of passenger mutations remains to be determined. To tackle these issues, we curated a balanced dataset comprising driver mutations sourced from the COSMIC database and high-quality passenger mutations obtained from the Cancer Passenger Mutation database. Subsequently, we encoded the distinctive features of these mutations. Utilizing feature correlation analysis, we developed a cancer driver missense mutation predictor called CDMPred employing feature selection through the ensemble learning technique XGBoost. The proposed CDMPred method, utilizing the top 10 features and XGBoost, achieved an area under the receiver operating characteristic curve (AUC) value of 0.83 and 0.80 on the training and independent test sets, respectively. Furthermore, CDMPred demonstrated superior performance compared to existing state-of-the-art methods for cancer-specific and general diseases, as measured by AUC and area under the precision-recall curve. Including high-quality passenger mutations in the training data proves advantageous for CDMPred\'s prediction performance. We anticipate that CDMPred will be a valuable tool for predicting cancer driver mutations, furthering our understanding of personalized therapy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    血小板功能由专门的表面标志物的表达驱动。近年来出现了不同循环血小板亚群的概念,但它们的确切性质仍有争议。
    为了设计基于光谱流式细胞术的表型工作流程,以提供更全面的表征,在全球和个人层面,静息和激活的健康血小板中的表面标志物,并应用此工作流程来调查反应如何根据血小板年龄而有所不同。
    开发了14标记流式细胞术小组,并将其应用于从健康志愿者获得的赋形剂或激动剂刺激的富含血小板的血浆和全血样品,或根据SYTO-13(ThermoFisherScientific)染色强度分选的血小板作为血小板年龄的指标。使用用户主导和独立的方法分析数据,并结合基于机器学习的新算法。
    该测定检测到健康血小板中标志物表达的差异,在休息和激动剂激活时,在富含血小板的血浆和全血样本中,这与文献是一致的。机器学习以高准确度(>80%)识别刺激的血小板群体。同样,年轻和老年血小板群体之间的机器学习差异达到76%的准确率,主要由前向散射加权,分化簇(CD)41,侧向散射,糖蛋白VI,CD61和CD42b表达模式。
    我们的方法提供了强大的表型分析,以及强大的生物信息学和机器学习工作流程,可深入分析血小板亚群。可裂解的受体,糖蛋白VI和CD42b,有助于定义共享和独特的亚群。这个可收养的,低容量方法在疾病中血小板的深度表征中将是有价值的。
    UNASSIGNED: Platelet function is driven by the expression of specialized surface markers. The concept of distinct circulating subpopulations of platelets has emerged in recent years, but their exact nature remains debatable.
    UNASSIGNED: To design a spectral flow cytometry-based phenotyping workflow to provide a more comprehensive characterization, at a global and individual level, of surface markers in resting and activated healthy platelets, and to apply this workflow to investigate how responses differ according to platelet age.
    UNASSIGNED: A 14-marker flow cytometry panel was developed and applied to vehicle- or agonist-stimulated platelet-rich plasma and whole blood samples obtained from healthy volunteers, or to platelets sorted according to SYTO-13 (Thermo Fisher Scientific) staining intensity as an indicator of platelet age. Data were analyzed using both user-led and independent approaches incorporating novel machine learning-based algorithms.
    UNASSIGNED: The assay detected differences in marker expression in healthy platelets, at rest and on agonist activation, in both platelet-rich plasma and whole blood samples, that are consistent with the literature. Machine learning identified stimulated populations of platelets with high accuracy (>80%). Similarly, machine learning differentiation between young and old platelet populations achieved 76% accuracy, primarily weighted by forward scatter, cluster of differentiation (CD) 41, side scatter, glycoprotein VI, CD61, and CD42b expression patterns.
    UNASSIGNED: Our approach provides a powerful phenotypic assay coupled with robust bioinformatic and machine learning workflows for deep analysis of platelet subpopulations. Cleavable receptors, glycoprotein VI and CD42b, contribute to defining shared and unique subpopulations. This adoptable, low-volume approach will be valuable in deep characterization of platelets in disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    分子特性的准确预测在药物发现中至关重要。传统方法通常忽略现实世界分子通常表现出具有复杂相关性的多个性质标记。为此,我们提出了一个新的框架,HiPM,它代表分层提示分子表示学习框架。HiPM利用任务感知提示来增强分子表示中任务的差异表达,并减轻由单个任务信息中的冲突引起的负转移。我们的框架包括两个核心组件:分子表示编码器(MRE)和任务感知提示器(TAP)。MRE采用分层的消息传递网络体系结构来捕获原子和基序级别的分子特征。同时,TAP利用凝聚层次聚类算法构建反映任务亲和力和独特性的提示树,使模型能够考虑任务之间的多粒度关联信息,从而有效地处理了多标签属性预测的复杂性。大量实验表明,HiPM在各种多标签数据集上实现了最先进的性能,为多标签分子表征学习提供了新的视角。
    Accurate prediction of molecular properties is crucial in drug discovery. Traditional methods often overlook that real-world molecules typically exhibit multiple property labels with complex correlations. To this end, we propose a novel framework, HiPM, which stands for Hierarchical Prompted Molecular representation learning framework. HiPM leverages task-aware prompts to enhance the differential expression of tasks in molecular representations and mitigate negative transfer caused by conflicts in individual task information. Our framework comprises two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atom and motif levels. Meanwhile, TAP utilizes agglomerative hierarchical clustering algorithm to construct a prompt tree that reflects task affinity and distinctiveness, enabling the model to consider multi-granular correlation information among tasks, thereby effectively handling the complexity of multi-label property prediction. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a novel perspective on multi-label molecular representation learning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了更准确地诊断和治疗不同亚型甲状腺癌,我们构建了与THCA亚型碘代谢相关的诊断模型.THCA表达谱,相应的临床病理信息,和单细胞RNA-seq从TCGA和GEO数据库下载。通过GSVA获得与甲状腺分化评分相关的基因。通过逻辑分析,最后构建了诊断模型。DCA曲线,ROC曲线,机器学习,采用K-M分析验证模型的准确性。qRT-PCR用于验证hub基因的体外表达。不同TDS和THCA亚型之间有104个交叉基因。最后,5个基因(ABAT,获得了可以独立预测TDS亚群的CHEK1,GPX3,NME5和PRKCQ),并构建了诊断模型。ROC,DCA,和RCS曲线表明该模型具有准确的预测能力。K-M和亚组分析结果表明,在THCA患者中,低模型评分与低PFI密切相关。模型评分与T细胞滤泡辅助细胞呈显著负相关。此外,诊断模型与免疫评分呈显著负相关.最后,qRT-PCR结果与生物信息学结果一致。该诊断模型对THCA患者具有良好的诊断和预后价值,可作为THCA患者的独立预后指标。
    To more accurately diagnose and treat patients with different subtypes of thyroid cancer, we constructed a diagnostic model related to the iodine metabolism of THCA subtypes. THCA expression profiles, corresponding clinicopathological information, and single-cell RNA-seq were downloaded from TCGA and GEO databases. Genes related to thyroid differentiation score were obtained by GSVA. Through logistic analyses, the diagnostic model was finally constructed. DCA curve, ROC curve, machine learning, and K-M analysis were used to verify the accuracy of the model. qRT-PCR was used to verify the expression of hub genes in vitro. There were 104 crossover genes between different TDS and THCA subtypes. Finally, 5 genes (ABAT, CHEK1, GPX3, NME5, and PRKCQ) that could independently predict the TDS subpopulation were obtained, and a diagnostic model was constructed. ROC, DCA, and RCS curves exhibited that the model has accurate prediction ability. K-M and subgroup analysis results showed that low model scores were strongly associated with poor PFI in THCA patients. The model score was significantly negatively correlated with T cell follicular helper. In addition, the diagnostic model was significantly negatively correlated with immune scores. Finally, the results of qRT-PCR corresponded with bioinformatics results. This diagnostic model has good diagnostic and prognostic value for THCA patients, and can be used as an independent prognostic indicator for THCA patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    已证明凝血和纤溶状态异常与炎症性肠病有关。然而,目前还没有研究系统地研究凝血和纤溶相关基因在溃疡性结肠炎(UC)诊断中的作用.UC相关数据集(GSE169568和GSE94648)来源于基因表达综合数据库。通过结合差异表达分析和机器学习算法来鉴定与凝血和纤维蛋白溶解相关的生物标志物。此外,进行基因集富集分析和免疫分析。共有4种生物标志物(MAP2K1、CREBBP、TAF1和HP)被识别,和生物标志物在与免疫相关的途径中显著富集,如T细胞受体信号通路,原发性免疫缺陷,趋化因子信号通路,等。总的来说,UC和对照组之间4种免疫细胞的浸润丰度明显不同,即嗜酸性粒细胞,巨噬细胞M0,静息的肥大细胞,和调节性T细胞。所有生物标志物均与嗜酸性粒细胞显著相关。我们的发现检测到4种凝血和纤溶相关生物标志物(MAP2K1,CREBBP,TAF1和HP)适用于UC,这有助于UC的进一步临床研究。
    Abnormalities in coagulation and fibrinolytic status have been demonstrated to be relevant to inflammatory bowel disease. Nevertheless, there is no study to methodically examine the role of the coagulation and fibrinolysis-related genes in the diagnosis of ulcerative colitis (UC). UC-related datasets (GSE169568 and GSE94648) were originated from the Gene Expression Omnibus database. The biomarkers related to coagulation and fibrinolysis were identified through combining differentially expressed analysis and machine learning algorithms. Moreover, Gene Set Enrichment Analysis and immune analysis were carried out. A total of 4 biomarkers (MAP2K1, CREBBP, TAF1, and HP) were identified, and biomarkers were markedly enriched in pathways related to immunity, such as T-cell receptor signaling pathway, primary immunodeficiency, chemokine signaling pathway, etc. In total, the infiltrating abundance of 4 immune cells between UC and control was markedly different, namely eosinophils, macrophage M0, resting mast cells, and regulatory T cells. And all biomarkers were significantly relevant to eosinophils. Our findings detected 4 coagulation and fibrinolysis-related biomarkers (MAP2K1, CREBBP, TAF1, and HP) for UC, which contributed to the advancement of UC for further clinical investigation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    原发性醛固酮增多症(PA)和阻塞性睡眠呼吸暂停(OSA)均被认为是高血压的独立危险因素。这可能导致心血管疾病发病率和死亡率的增加。临床研究发现OSA和PA之间存在双向关系。然而,它们之间的潜在机制尚不清楚。本研究旨在探讨PA和OSA的共同遗传特征和潜在的分子机制。我们从基因表达综合(GEO)数据库中获得了醛固酮产生腺瘤(APA)和OSA的微阵列数据集。加权基因共表达网络分析(WGCNA)用于选择与APA和OSA相关的共表达模块,两种疾病的共同基因是通过交叉获得的。随后,通过功能富集分析鉴定了APA和OSA的hub基因,蛋白质-蛋白质相互作用(PPI),数据集,和公共数据库。最后,我们预测了hub基因的转录因子(TFs)和miRNA。总的来说,通过WGCNA获得了52个常见基因。常见基因的基因本体论(GO)包括白细胞介素-1反应,细胞因子活性,和趋化因子受体结合。功能富集分析强调了TNF,IL-17信号,以及与APA和OSA相关的细胞因子-细胞因子受体相互作用。通过PPI,数据集,和公共数据库验证,我们鉴定了APA和OSA之间的5个hub基因(IL6,ATF3,PTGS2,CCL2和CXCL2).我们的研究确定了APA和OSA之间共有的5个hub基因(IL6,ATF3,PTGS2,CCL2和CXCL2)。通过生物信息学分析,我们发现这两种疾病在炎症方面表现出相对相似性,压力,免疫功能受损.hub基因的鉴定可能为PA和OSA的诊断和预后提供潜在的生物标志物。
    Primary aldosteronism (PA) and obstructive sleep apnea (OSA) are both considered independent risk factors for hypertension, which can lead to an increase in cardiovascular disease incidence and mortality. Clinical studies have found a bidirectional relationship between OSA and PA. However, the underlying mechanism between them is not yet clear. This study aims to investigate the shared genetic characteristics and potential molecular mechanisms of PA and OSA. We obtained microarray datasets of aldosterone-producing adenoma (APA) and OSA from the gene expression omnibus (GEO) database. Weighted gene co-expression network analysis (WGCNA) was used to select co-expression modules associated with APA and OSA, and common genes of the two diseases were obtained by intersection. Subsequently, hub genes for APA and OSA were identified through functional enrichment analysis, protein-protein interaction (PPI), datasets, and public database. Finally, we predicted the transcription factors (TFs) and mirRNAs of the hub genes. In total, 52 common genes were obtained by WGCNA. The Gene Ontology (GO) of common genes includes interleukin-1 response, cytokine activity, and chemokine receptor binding. Functional enrichment analysis highlighted the TNF, IL-17 signaling, and cytokine-cytokine receptor interactions related to APA and OSA. Through PPI, datasets, and public databases verification, we identified 5 hub genes between APA and OSA (IL6, ATF3, PTGS2, CCL2, and CXCL2). Our study identified shared 5 hub genes between APA and OSA (IL6, ATF3, PTGS2, CCL2, and CXCL2). Through bioinformatics analysis, we found that the 2 disorders showed relative similarity in terms of inflammation, stress, and impaired immune function. The identification of hub genes may offer potential biomarkers for the diagnosis and prognosis of PA and OSA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    单细胞转录组学的进展为探索复杂的生物过程提供了前所未有的机会。然而,分析单细胞转录组学的计算方法仍有改进的空间,特别是在降维方面,细胞聚类,和小区通信推断。在这里,我们提出了一种通用的方法,名为DcjComm,用于单细胞转录组学的综合分析。DcjComm通过基于非负矩阵分解的联合学习模型检测功能模块以探索表达模式并执行降维和聚类以发现细胞身份。然后,DcjComm通过整合配体-受体对推断细胞-细胞通讯,转录因子,和目标基因。与最先进的方法相比,DcjComm表现出卓越的性能。
    Advances in single-cell transcriptomics provide an unprecedented opportunity to explore complex biological processes. However, computational methods for analyzing single-cell transcriptomics still have room for improvement especially in dimension reduction, cell clustering, and cell-cell communication inference. Herein, we propose a versatile method, named DcjComm, for comprehensive analysis of single-cell transcriptomics. DcjComm detects functional modules to explore expression patterns and performs dimension reduction and clustering to discover cellular identities by the non-negative matrix factorization-based joint learning model. DcjComm then infers cell-cell communication by integrating ligand-receptor pairs, transcription factors, and target genes. DcjComm demonstrates superior performance compared to state-of-the-art methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:通过单细胞RNA测序(scRNA-seq)和RNA测序(RNA-seq)数据确定细胞凋亡相关基因(CRGs)与肝细胞癌(HCC)预后之间的联系。相关数据从GEO和TCGA数据库下载.通过scRNA-seq数据库中HCC患者和正常对照(NC)之间差异表达基因(DEG)的重叠来过滤差异表达的CRGs(DE-CRGs)。高和低CRG活性细胞之间的DE-CRG,和TCGA数据库中HCC患者和NC之间的DEG。
    结果:在HCC中确定了33个DE-CRGs。使用六个生存相关基因(SRGs)(NDRG2,CYB5A,SOX4,MYC,TM4SF1和IFI27)通过单变量Cox回归分析和LASSO。通过列线图和接收器工作特性曲线验证了模型的预测能力。研究已将肿瘤免疫功能障碍和排斥作为检查PM对免疫异质性影响的手段。巨噬细胞M0水平在高危组(HRG)和低危组(LRG)之间有显著差异,和更高的巨噬细胞水平与更不利的预后有关。药物敏感性数据表明,HRG和LRG之间伊达比星和雷帕霉素的半数最大药物抑制浓度存在实质性差异。通过使用公共数据集和我们的队列在蛋白质和mRNA水平上验证了该模型。
    结论:使用6个SRG(NDRG2,CYB5A,SOX4,MYC,TM4SF1和IFI27)是通过生物信息学研究开发的。该模型可能为评估和管理HCC提供新的视角。
    OBJECTIVE: To ascertain the connection between cuproptosis-related genes (CRGs) and the prognosis of hepatocellular carcinoma (HCC) via single-cell RNA sequencing (scRNA-seq) and RNA sequencing (RNA-seq) data, relevant data were downloaded from the GEO and TCGA databases. The differentially expressed CRGs (DE-CRGs) were filtered by the overlaps in differentially expressed genes (DEGs) between HCC patients and normal controls (NCs) in the scRNA-seq database, DE-CRGs between high- and low-CRG-activity cells, and DEGs between HCC patients and NCs in the TCGA database.
    RESULTS: Thirty-three DE-CRGs in HCC were identified. A prognostic model (PM) was created employing six survival-related genes (SRGs) (NDRG2, CYB5A, SOX4, MYC, TM4SF1, and IFI27) via univariate Cox regression analysis and LASSO. The predictive ability of the model was validated via a nomogram and receiver operating characteristic curves. Research has employed tumor immune dysfunction and exclusion as a means to examine the influence of PM on immunological heterogeneity. Macrophage M0 levels were significantly different between the high-risk group (HRG) and the low-risk group (LRG), and a greater macrophage level was linked to a more unfavorable prognosis. The drug sensitivity data indicated a substantial difference in the half-maximal drug-suppressive concentrations of idarubicin and rapamycin between the HRG and the LRG. The model was verified by employing public datasets and our cohort at both the protein and mRNA levels.
    CONCLUSIONS: A PM using 6 SRGs (NDRG2, CYB5A, SOX4, MYC, TM4SF1, and IFI27) was developed via bioinformatics research. This model might provide a fresh perspective for assessing and managing HCC.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由于实验限制和人类偏见,许多生物学问题尚未得到充分研究。尽管深度学习在加速科学发现方面很有希望,当应用于几乎没有标记数据和数据分布变化的问题时,它的力量会妥协。我们开发了一个深度学习框架-元模型不可知伪标签学习(MMAPLE)-通过在常规迁移学习失败时有效地探索非分布(OOD)未标记数据来解决这些挑战。MMAPLE的独特之处在于整合了元学习的概念,迁移学习和半监督学习成为一个统一的框架。MMAPLE的强大功能在OOD设置中的三个应用中得到了证明,其中看不见的数据中的化学物质或蛋白质与训练数据中的化学物质或蛋白质有很大不同:预测药物-靶标相互作用,隐藏的人类代谢物-酶相互作用,和未研究的种间微生物组代谢产物-人类受体相互作用。与各种基础模型相比,MMAPLE在多个OOD基准测试中的预测召回率提高了11%至242%。使用MMAPLE,我们揭示了新的种间代谢物-蛋白质相互作用,这些相互作用通过活性测定得到验证,并填补了微生物组-人类相互作用中缺失的环节。MMAPLE是一个通用框架,用于探索现有实验和计算技术无法实现的先前未识别的生物学领域。
    Many biological problems are understudied due to experimental limitations and human biases. Although deep learning is promising in accelerating scientific discovery, its power compromises when applied to problems with scarcely labeled data and data distribution shifts. We develop a deep learning framework-Meta Model Agnostic Pseudo Label Learning (MMAPLE)-to address these challenges by effectively exploring out-of-distribution (OOD) unlabeled data when conventional transfer learning fails. The uniqueness of MMAPLE is to integrate the concept of meta-learning, transfer learning and semi-supervised learning into a unified framework. The power of MMAPLE is demonstrated in three applications in an OOD setting where chemicals or proteins in unseen data are dramatically different from those in training data: predicting drug-target interactions, hidden human metabolite-enzyme interactions, and understudied interspecies microbiome metabolite-human receptor interactions. MMAPLE achieves 11% to 242% improvement in the prediction-recall on multiple OOD benchmarks over various base models. Using MMAPLE, we reveal novel interspecies metabolite-protein interactions that are validated by activity assays and fill in missing links in microbiome-human interactions. MMAPLE is a general framework to explore previously unrecognized biological domains beyond the reach of present experimental and computational techniques.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号