关键词: Quantitative explainability measurement XAI explainable artificial intelligence predictive modeling

来  源:   DOI:10.1080/26896583.2024.2340391

Abstract:
In the rapidly evolving field of artificial intelligence (AI), explainability has been traditionally assessed in a post-modeling process and is often subjective. In contrary, many quantitative metrics have been routinely used to assess a model\'s performance. We proposed a unified formular named PERForm, by incorporating explainability as a weight into the existing statistical metrics to provide an integrated and quantitative measure of both predictivity and explainability to guide model selection, application, and evaluation. PERForm was designed as a generic formula and can be applied to any data types. We applied PERForm on a range of diverse datasets, including DILIst, Tox21, and three MAQC-II benchmark datasets, using various modeling algorithms to predict a total of 73 distinct endpoints. For example, AdaBoost algorithms exhibited superior performance (PERForm AUC for AdaBoost is 0.129 where Linear regression is 0) in DILIst prediction, where linear regression outperformed other models in the majority of Tox21 endpoints (PERForm AUC for linear regression is 0.301 where AdaBoost is 0.283 in average). This research marks a significant step toward comprehensively evaluating the utility of an AI model to advance transparency and interpretability, where the tradeoff between a model\'s performance and its interpretability can have profound implications.
摘要:
在快速发展的人工智能(AI)领域,传统上,可解释性是在建模后的过程中进行评估的,并且通常是主观的。相反,许多定量指标通常被用来评估模型的性能。我们提出了一个统一的公式,名为PERForm,通过将可解释性作为权重纳入现有的统计指标,以提供对预测性和可解释性的综合和定量度量,以指导模型选择,应用程序,和评价。PERForm被设计为通用公式,可以应用于任何数据类型。我们在一系列不同的数据集上应用了PERForm,包括Dilist,Tox21和三个MAQC-II基准数据集,使用各种建模算法来预测总共73个不同的端点。例如,AdaBoost算法在DILIst预测中表现出卓越的性能(AdaBoost的PERFormAUC为0.129,其中线性回归为0),在大多数Tox21终点中,线性回归优于其他模型(线性回归的PERFormAUC为0.301,其中AdaBoost平均为0.283)。这项研究标志着朝着全面评估AI模型的实用性迈出了重要一步,以提高透明度和可解释性。其中模型的性能与其可解释性之间的权衡可能会产生深远的影响。
公众号