关键词: BERT Molecular motif Multi-view task Toxicity identification

Mesh : Models, Chemical Algorithms

来  源:   DOI:10.1016/j.jhazmat.2024.135114

Abstract:
Toxicity identification plays a key role in maintaining human health, as it can alert humans to the potential hazards caused by long-term exposure to a wide variety of chemical compounds. Experimental methods for determining toxicity are time-consuming, and costly, while computational methods offer an alternative for the early identification of toxicity. For example, some classical ML and DL methods, which demonstrate excellent performance in toxicity prediction. However, these methods also have some defects, such as over-reliance on artificial features and easy overfitting, etc. Proposing novel models with superior prediction performance is still an urgent task. In this study, we propose a motifs-level graph-based multi-view pretraining language model, called 3MTox, for toxicity identification. The 3MTox model uses Bidirectional Encoder Representations from Transformers (BERT) as the backbone framework, and a motif graph as input. The results of extensive experiments showed that our 3MTox model achieved state-of-the-art performance on toxicity benchmark datasets and outperformed the baseline models considered. In addition, the interpretability of the model ensures that the it can quickly and accurately identify toxicity sites in a given molecule, thereby contributing to the determination of the status of toxicity and associated analyses. We think that the 3MTox model is among the most promising tools that are currently available for toxicity identification.
摘要:
毒性鉴定在维护人类健康中起着关键作用,因为它可以提醒人类长期接触各种化合物所造成的潜在危害。确定毒性的实验方法很耗时,而且昂贵,而计算方法为早期识别毒性提供了一种替代方法。例如,一些经典的ML和DL方法,在毒性预测中表现出优异的性能。然而,这些方法也有一些缺陷,例如过度依赖人工特征和容易过度拟合,等。提出具有优越预测性能的新模型仍然是一项紧迫的任务。在这项研究中,我们提出了一种基于motifs级图的多视图预训练语言模型,叫做3MTox,用于毒性鉴定。3MTox模型使用来自变压器的双向编码器表示(BERT)作为骨干框架,和一个图案图作为输入。大量实验的结果表明,我们的3MTox模型在毒性基准数据集上实现了最先进的性能,并且优于所考虑的基准模型。此外,模型的可解释性保证了它能快速准确地识别给定分子中的毒性位点,从而有助于确定毒性状态和相关分析。我们认为3MTox模型是目前可用于毒性鉴定的最有前途的工具之一。
公众号