关键词: Distributed random forest Machine learning Opioid misuse Penalized logistic regression Substance use

Mesh : Adolescent Adolescent Behavior Algorithms Area Under Curve Child Female Humans Logistic Models Machine Learning Male Opioid-Related Disorders / epidemiology Risk Assessment / methods Surveys and Questionnaires United States / epidemiology

来  源:   DOI:10.1016/j.ypmed.2019.105886   PDF(Sci-hub)

Abstract:
This study evaluated prediction performance of three different machine learning (ML) techniques in predicting opioid misuse among U.S. adolescents. Data were drawn from the 2015-2017 National Survey on Drug Use and Health (N = 41,579 adolescents, ages 12-17 years) and analyzed in 2019. Prediction models were developed using three ML algorithms, including artificial neural networks, distributed random forest, and gradient boosting machine. The performance of the ML prediction models was compared with performance of the penalized logistic regression. The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) were used as metrics of prediction performance. We used the AUPRC as the primary measure of prediction performance given that it is considered more informative for assessing binary classifiers on imbalanced outcome variable than AUROC. The overall rate of opioid misuse among U.S. adolescents was 3.7% (n = 1521). Prediction performance was similar across the four models (AUROC values range from 0.809 to 0.815). In terms of the AUPRC, the distributed random forest showed the best performance in prediction (0.172) followed by penalized logistic regression (0.162), gradient boosting machine (0.160), and artificial neural networks (0.157). Findings suggest that machine learning techniques can be a promising technique especially in the prediction of outcomes with rare cases (i.e., when the binary outcome variable is heavily lopsided) such as adolescent opioid misuse.
摘要:
这项研究评估了三种不同的机器学习(ML)技术在预测美国青少年滥用阿片类药物方面的预测性能。数据来自2015-2017年全国药物使用和健康调查(N=41,579名青少年,年龄12-17岁),并在2019年进行了分析。使用三种ML算法开发了预测模型,包括人工神经网络,分布式随机森林,和梯度增压机。将ML预测模型的性能与惩罚逻辑回归的性能进行了比较。使用接受者工作特征曲线下面积(AUROC)和精确召回曲线下面积(AUPRC)作为预测性能的度量。我们使用AUPRC作为预测性能的主要量度,因为它被认为比AUROC更有助于评估不平衡结果变量的二元分类器。美国青少年阿片类药物滥用的总体率为3.7%(n=1521)。四个模型的预测性能相似(AUROC值范围为0.809至0.815)。就AUPRC而言,分布式随机森林在预测中表现最好(0.172),其次是惩罚逻辑回归(0.162),梯度增压机(0.160),和人工神经网络(0.157)。研究结果表明,机器学习技术可以是一种有前途的技术,特别是在极少数情况下的结果预测中(即,当二元结果变量严重不平衡时),如青少年阿片类药物滥用。
公众号