Mesh : Cluster Analysis Machine Learning Unsupervised Machine Learning Toxicology / methods Algorithms

来  源:   DOI:10.1289/EHP14001   PDF(Pubmed)

Abstract:
UNASSIGNED: The field of toxicology has witnessed substantial advancements in recent years, particularly with the adoption of new approach methodologies (NAMs) to understand and predict chemical toxicity. Class-based methods such as clustering and classification are key to NAMs development and application, aiding the understanding of hazard and risk concerns associated with groups of chemicals without additional laboratory work. Advances in computational chemistry, data generation and availability, and machine learning algorithms represent important opportunities for continued improvement of these techniques to optimize their utility for specific regulatory and research purposes. However, due to their intricacy, deep understanding and careful selection are imperative to align the adequate methods with their intended applications.
UNASSIGNED: This commentary aims to deepen the understanding of class-based approaches by elucidating the pivotal role of chemical similarity (structural and biological) in clustering and classification approaches (CCAs). It addresses the dichotomy between general end point-agnostic similarity, often entailing unsupervised analysis, and end point-specific similarity necessitating supervised learning. The goal is to highlight the nuances of these approaches, their applications, and common misuses.
UNASSIGNED: Understanding similarity is pivotal in toxicological research involving CCAs. The effectiveness of these approaches depends on the right definition and measure of similarity, which varies based on context and objectives of the study. This choice is influenced by how chemical structures are represented and the respective labels indicating biological activity, if applicable. The distinction between unsupervised clustering and supervised classification methods is vital, requiring the use of end point-agnostic vs. end point-specific similarity definition. Separate use or combination of these methods requires careful consideration to prevent bias and ensure relevance for the goal of the study. Unsupervised methods use end point-agnostic similarity measures to uncover general structural patterns and relationships, aiding hypothesis generation and facilitating exploration of datasets without the need for predefined labels or explicit guidance. Conversely, supervised techniques demand end point-specific similarity to group chemicals into predefined classes or to train classification models, allowing accurate predictions for new chemicals. Misuse can arise when unsupervised methods are applied to end point-specific contexts, like analog selection in read-across, leading to erroneous conclusions. This commentary provides insights into the significance of similarity and its role in supervised classification and unsupervised clustering approaches. https://doi.org/10.1289/EHP14001.
摘要:
近年来,毒理学领域取得了重大进展,特别是采用新的方法方法(NAMs)来理解和预测化学毒性。聚类和分类等基于类的方法是NAM开发和应用的关键,帮助了解与化学品组相关的危险和风险问题,无需额外的实验室工作。计算化学的进展,数据生成和可用性,和机器学习算法代表了持续改进这些技术以优化其用于特定监管和研究目的的重要机会。然而,由于它们的复杂性,深入的理解和仔细的选择是必要的,以使适当的方法与其预期的应用保持一致。
本评论旨在通过阐明化学相似性(结构和生物学)在聚类和分类方法(CCA)中的关键作用,加深对基于类别的方法的理解。它解决了一般终点与不可知相似性之间的二分法,通常需要无监督分析,和终点特定的相似性需要监督学习。目标是突出这些方法的细微差别,他们的应用,和常见的误用。
了解相似性对于涉及CCA的毒理学研究至关重要。这些方法的有效性取决于相似性的正确定义和度量,这取决于研究的背景和目标。这种选择受到化学结构如何表示和指示生物活性的相应标签的影响。如果适用。无监督聚类和监督分类方法之间的区别至关重要,要求使用终点不可知与端点特定的相似性定义。这些方法的单独使用或组合需要仔细考虑,以防止偏见并确保与研究目标的相关性。无监督方法使用终点不可知的相似性度量来揭示一般的结构模式和关系,帮助假设生成并促进数据集的探索,而无需预定义的标签或明确的指导。相反,监督技术要求特定于终点的相似性,将化学品分为预定义的类别或训练分类模型,允许对新的化学物质进行准确的预测。当将无监督方法应用于特定于终点的上下文时,可能会出现误用,比如阅读中的模拟选择,导致错误的结论。这篇评论提供了对相似性的重要性及其在监督分类和无监督聚类方法中的作用的见解。https://doi.org/10.1289/EHP14001.
公众号