■美国环境保护署的内分泌干扰物筛选计划(EDSP)一级测定用于筛选潜在的内分泌系统干扰化学品。已提出整合来自16个高通量筛选测定的数据以预测雌激素受体(ER)激动作用的模型作为一些低通量Tier1测定的替代方案。后来的工作表明,少至四个测定可以复制完整模型的ER激动预测,灵敏度为98%,特异性为92%。当前的研究利用化学聚类来说明在现有ER途径模型中测试的EDSP化学品领域(UoC)的覆盖范围,并研究化学聚类的实用性,以使用现有的4-分析模型作为测试案例来评估筛选方法。虽然完整的原始分析电池不再可用,所证明的化学聚类的贡献广泛适用于测定集,化学品库存,和模型,和使用的数据分析也可以应用于将来评估最小测定模型,以便在筛选时考虑。
■通过CompToxChemicals仪表板从10,000多个UoC中收集了6,947种物质的化学结构,并根据结构相似性进行分组,产生826个化学团簇。在原始ER模型中运行的1,812种物质中,1,730种物质有一个单一的,明确的结构。EDSPUoC中不存在的具有明确定义结构的ER模型化学品使用k-最近邻方法分配给化学簇,产生557个含有至少一种ER模型化学物质的EDSPUoC簇。
■与现有的全ER激动剂模型相比,现有的4-测定模型的性能被分析为与化学聚类相关。这是一个案例研究,并且可以使用其中筛选相同化学品(或化学品的子集)的任何子集模型来执行类似的分析。在含有>1ER模型化学物质的365个簇中,321没有通过完整的ER激动剂模型预测为激动剂的任何化学物质。通过预测来自321个簇中的91个的122种化学物质的激动剂活性,最佳4-测定子集ER激动剂模型与完整ER激动剂模型不一致。根据完整的ER激动剂模型,有44个簇具有至少两种化学物质和至少一种激动剂。这允许在每个集群的基础上进行准确性预测。在这44个集群中,最佳的4个检测子集ER激动剂模型的准确性范围为50%至100%,32个聚类的准确率≥90%。总的来说,与完整的ER激动剂模型相比,最佳的4个检测子集ER激动剂模型导致122个假阳性预测和仅2个假阴性预测.大多数假阳性(89)仅在四个测定中的两个中具有活性,而除了11种真正的阳性化学物质外,所有这些化学物质在至少三个检测中都有活性。假阳性化学物质也倾向于具有较低的曲线下面积(AUC)值,122个假阳性中的110个具有低于0.214的AUC值,这低于全ER激动剂模型所预测的阳性的75%。许多假阳性证明了边界活性。来自最佳4-测定子集ER激动剂模型的122个假阳性的中值AUC值为0.138,而主动预测的阈值为0.1。
■我们的结果表明,现有的4-测定模型在一系列结构多样的化学物质中表现良好。尽管这是对先前结果的描述性分析,几个概念可以应用于未来使用的任何筛选模型。首先,化学品的聚类提供了一种确保未来筛选评估考虑EDSPUoC所代表的广泛化学空间的方法。这些集群还可以帮助根据特定集群中已知化学品的活动,对这些集群中的未来化学品进行优先筛选。聚类方法可用于提供一个框架,以评估EDSPUoC化学空间的哪些部分被计算机模拟和体外方法可靠地覆盖,并且其中单独使用任一种方法或两种方法组合的预测是最可靠的。从这个案例研究中吸取的教训可以很容易地应用于模型适用性的未来评估和筛选,以评估未来的数据集。
UNASSIGNED: The U. S. Environmental Protection Agency\'s Endocrine Disruptor Screening Program (EDSP) Tier 1 assays are used to screen for potential endocrine system-disrupting chemicals. A model integrating data from 16 high-throughput screening assays to predict estrogen receptor (ER) agonism has been proposed as an alternative to some low-throughput Tier 1 assays. Later work demonstrated that as few as four assays could replicate the ER agonism predictions from the full model with 98% sensitivity and 92% specificity. The current study utilized chemical clustering to illustrate the coverage of the EDSP Universe of Chemicals (UoC) tested in the existing ER pathway models and to investigate the utility of chemical clustering to evaluate the screening approach using an existing 4-assay model as a test case. Although the full original assay battery is no longer available, the demonstrated contribution of chemical clustering is broadly applicable to assay sets, chemical inventories, and models, and the data analysis used can also be applied to future evaluation of minimal assay models for consideration in screening.
UNASSIGNED: Chemical structures were collected for 6,947 substances via the CompTox Chemicals Dashboard from the over 10,000 UoC and grouped based on structural similarity, generating 826 chemical clusters. Of the 1,812 substances run in the original ER model, 1,730 substances had a single, clearly defined structure. The ER model chemicals with a clearly defined structure that were not present in the EDSP UoC were assigned to chemical clusters using a k-nearest neighbors approach, resulting in 557 EDSP UoC clusters containing at least one ER model chemical.
UNASSIGNED: Performance of an existing 4-assay model in comparison with the existing full ER agonist model was analyzed as related to chemical clustering. This was a case study, and a similar analysis can be performed with any subset model in which the same chemicals (or subset of chemicals) are screened. Of the 365 clusters containing >1 ER model chemical, 321 did not have any chemicals predicted to be agonists by the full ER agonist model. The best 4-assay subset ER agonist model disagreed with the full ER agonist model by predicting agonist activity for 122 chemicals from 91 of the 321 clusters. There were 44 clusters with at least two chemicals and at least one agonist based upon the full ER agonist model, which allowed accuracy predictions on a per-cluster basis. The accuracy of the best 4-assay subset ER agonist model ranged from 50% to 100% across these 44 clusters, with 32 clusters having accuracy ≥90%. Overall, the best 4-assay subset ER agonist model resulted in 122 false-positive and only 2 false-negative predictions compared with the full ER agonist model. Most false positives (89) were active in only two of the four assays, whereas all but 11 true positive chemicals were active in at least three assays. False positive chemicals also tended to have lower area under the curve (AUC) values, with 110 out of 122 false positives having an AUC value below 0.214, which is lower than 75% of the positives as predicted by the full ER agonist model. Many false positives demonstrated borderline activity. The median AUC value for the 122 false positives from the best 4-assay subset ER agonist model was 0.138, whereas the threshold for an active prediction is 0.1.
UNASSIGNED: Our results show that the existing 4-assay model performs well across a range of structurally diverse chemicals. Although this is a descriptive analysis of previous results, several concepts can be applied to any screening model used in the future. First, the clustering of the chemicals provides a means of ensuring that future screening evaluations consider the broad chemical space represented by the EDSP UoC. The clusters can also assist in prioritizing future chemicals for screening in specific clusters based on the activity of known chemicals in those clusters. The clustering approach can be useful in providing a framework to evaluate which portions of the EDSP UoC chemical space are reliably covered by in silico and in vitro approaches and where predictions from either method alone or both methods combined are most reliable. The lessons learned from this case study can be easily applied to future evaluations of model applicability and screening to evaluate future datasets.