关键词: Subgroup classification tree lasso recursive partitioning regression tree

来  源:   DOI:10.21037/atm.2018.03.07   PDF(Pubmed)

Abstract:
Randomized controlled trials (RCTs) usually enroll heterogeneous study population, and thus it is interesting to identify subgroups of patients for whom the treatment may be beneficial or harmful. A variety of methods have been developed to do such kind of post hoc analyses. Conventional generalized linear model is able to include prognostic variables as a main effect and predictive variables in an interaction with treatment variable. A statistically significant and large interaction effect usually indicates potential subgroups that may have different responses to the treatment. However, the conventional regression method requires to specify the interaction term, which requires knowledge of predictive variables or becomes infeasible when there is a large number of feature variables. The Least Absolute Shrinkage and Selection Operator (LASSO) method does variable selection by shrinking less clear effects (including interaction effects) to zero and in this way selects only certain variables and interactions for the model. There are many tree-based methods for subgroup identification. For example, model-based recursive partitioning incorporates parametric models such as generalized linear models into trees. The model incorporated is usually a simple model with only the treatment as covariate. Predictive and prognostic variables are found and incorporated automatically via the tree. The present article gives an overview of these methods and explains how to perform them using the free software environment for statistical computing R (version 3.3.2). A simulated dataset is employed for illustrating the performance of these methods.
摘要:
随机对照试验(RCT)通常招募异质性研究人群,因此,确定治疗可能有益或有害的患者亚组是有趣的。已经开发了多种方法来进行这种事后分析。传统的广义线性模型能够包括作为主要效应的预后变量和与治疗变量相互作用的预测变量。统计学上显著且大的相互作用效应通常表明可能对治疗有不同反应的潜在亚组。然而,传统的回归方法需要指定交互项,这需要预测变量的知识,或者当存在大量特征变量时变得不可行。最小绝对收缩和选择算子(LASSO)方法通过将不太清楚的效应(包括交互效应)收缩到零而进行变量选择,并且以这种方式仅选择模型的某些变量和交互。有许多基于树的子组识别方法。例如,基于模型的递归划分将参数模型(如广义线性模型)合并到树中。合并的模型通常是一个简单的模型,只有作为协变量的处理。通过树自动找到并合并预测变量和预后变量。本文概述了这些方法,并解释了如何使用用于统计计算R的自由软件环境(3.3.2版)来执行这些方法。使用模拟数据集来说明这些方法的性能。
公众号