关键词: accelerometer adaptive boosting bootstrap aggregating (bagging) ensemble method oversampling physical activity random forest undersampling accelerometer adaptive boosting bootstrap aggregating (bagging) ensemble method oversampling physical activity random forest undersampling

来  源:   DOI:10.3390/healthcare10071255

Abstract:
Accelerometer data collected from wearable devices have recently been used to monitor physical activities (PAs) in daily life. While the intensity of PAs can be distinguished with a cut-off approach, it is important to discriminate different behaviors with similar accelerometry patterns to estimate energy expenditure. We aim to overcome the data imbalance problem that negatively affects machine learning-based PA classification by extracting well-defined features and applying undersampling and oversampling methods. We extracted various temporal, spectral, and nonlinear features from wrist-, hip-, and ankle-worn accelerometer data. Then, the influences of undersampilng and oversampling were compared using various ML and DL approaches. Among various ML and DL models, ensemble methods including random forest (RF) and adaptive boosting (AdaBoost) exhibited great performance in differentiating sedentary behavior (driving) and three walking types (walking on level ground, ascending stairs, and descending stairs) even in a cross-subject paradigm. The undersampling approach, which has a low computational cost, exhibited classification results unbiased to the majority class. In addition, we found that RF could automatically select relevant features for PA classification depending on the sensor location by examining the importance of each node in multiple decision trees (DTs). This study proposes that ensemble learning using well-defined feature sets combined with the undersampling approach is robust for imbalanced datasets in PA classification. This approach will be useful for PA classification in the free-living situation, where data imbalance problems between classes are common.
摘要:
从可穿戴设备收集的加速度计数据最近已用于监测日常生活中的身体活动(PA)。虽然可以用截止方法区分PA的强度,用相似的加速度测量模式区分不同的行为来估计能量消耗是很重要的。我们的目标是通过提取明确定义的特征并应用欠采样和过采样方法来克服对基于机器学习的PA分类产生负面影响的数据不平衡问题。我们提取了各种时间,光谱,和来自腕部的非线性特征-,hip-,和脚踝磨损的加速度计数据。然后,使用各种ML和DL方法比较了欠采样和过采样的影响。在各种ML和DL模型中,包括随机森林(RF)和自适应增强(AdaBoost)在内的集成方法在区分久坐行为(驾驶)和三种步行类型(在水平地面上行走,上升楼梯,和下降楼梯),即使在跨主题范式中也是如此。欠抽样方法,计算成本低,表现出与多数类无偏见的分类结果。此外,我们发现,RF可以通过检查多个决策树(DT)中每个节点的重要性,根据传感器位置自动选择PA分类的相关特征.这项研究提出,使用定义明确的特征集结合欠采样方法的集成学习对于PA分类中的不平衡数据集具有鲁棒性。这种方法将有助于在自由生活情况下进行PA分类,类之间的数据不平衡问题很常见。
公众号