关键词: Shapley values interactions regression shrinkage variable importance

来  源:   DOI:10.1515/em-2023-0039   PDF(Pubmed)

Abstract:
UNASSIGNED: The addition of two-way interactions is a classic problem in statistics, and comes with the challenge of quadratically increasing dimension. We aim to a) devise an estimation method that can handle this challenge and b) to aid interpretation of the resulting model by developing computational tools for quantifying variable importance.
UNASSIGNED: Existing strategies typically overcome the dimensionality problem by only allowing interactions between relevant main effects. Building on this philosophy, and aiming for settings with moderate n to p ratio, we develop a local shrinkage model that links the shrinkage of interaction effects to the shrinkage of their corresponding main effects. In addition, we derive a new analytical formula for the Shapley value, which allows rapid assessment of individual-specific variable importance scores and their uncertainties.
UNASSIGNED: We empirically demonstrate that our approach provides accurate estimates of the model parameters and very competitive predictive accuracy. In our Bayesian framework, estimation inherently comes with inference, which facilitates variable selection. Comparisons with key competitors are provided. Large-scale cohort data are used to provide realistic illustrations and evaluations. The implementation of our method in RStan is relatively straightforward and flexible, allowing for adaptation to specific needs.
UNASSIGNED: Our method is an attractive alternative for existing strategies to handle interactions in epidemiological and/or clinical studies, as its linked local shrinkage can improve parameter accuracy, prediction and variable selection. Moreover, it provides appropriate inference and interpretation, and may compete well with less interpretable machine learners in terms of prediction.
摘要:
添加双向交互是统计学中的经典问题,并伴随着二次增加维度的挑战。我们的目标是a)设计一种可以应对这一挑战的估计方法,b)通过开发用于量化变量重要性的计算工具来帮助解释所得模型。
现有策略通常通过仅允许相关主要效应之间的相互作用来克服维度问题。在这种哲学的基础上,并针对具有适度n-p比率的设置,我们建立了一个局部收缩模型,将相互作用效应的收缩与它们相应的主效应的收缩联系起来。此外,我们推导了Shapley值的一个新的解析公式,这允许快速评估个体特定变量重要性评分及其不确定性。
我们凭经验证明,我们的方法提供了对模型参数的准确估计和非常有竞争力的预测准确性。在我们的贝叶斯框架中,估计本身就伴随着推理,这有助于变量选择。提供与主要竞争对手的比较。大规模队列数据用于提供现实的插图和评估。我们的方法在RStan中的实现相对简单和灵活,允许适应特定需求。
我们的方法是处理流行病学和/或临床研究中相互作用的现有策略的一种有吸引力的替代方法。由于其链接的局部收缩可以提高参数精度,预测和变量选择。此外,它提供了适当的推论和解释,并且可能在预测方面与解释性较低的机器学习者竞争得很好。
公众号