关键词: Cannabis Drug policy Ensemble Epidemiology Machine learning Prediction Public health law

Mesh : Humans United States Cannabis Legislation, Drug Marijuana Use / epidemiology Commerce Public Policy

来  源:   DOI:10.1016/j.drugpo.2024.104340   PDF(Pubmed)

Abstract:
BACKGROUND: There is substantial geographic variability in local cannabis policies within states that have legalized recreational cannabis. This study develops an interpretable machine learning model that uses county-level population demographics, sociopolitical factors, and estimates of substance use and mental illness prevalences to predict the legality of recreational cannabis sales within each U.S. county.
METHODS: We merged data and selected 14 model inputs from the 2010 Census, 2012 County Presidential Data from the MIT Elections Lab, and Small Area Estimates from the National Surveys on Drug Use and Health (NSDUH) from 2010 to 2012 at the county level. County policies were labeled as having recreational cannabis legal (RCL) if the sale of recreational cannabis was allowed anywhere in the county in 2014, resulting in 92 RCL and 3002 non-RCL counties. We used synthetic data augmentation and minority oversampling techniques to build an ensemble of 1000 logistic regressions on random sub-samples of the data, withholding one state at a time and building models from all remaining states. Performance was evaluated by comparing the predicted policy conditions with the actual outcomes in 2014.
RESULTS: When compared to the actual RCL policies in 2014, the ensemble estimated predictions of counties transitioning to RCL had a macro f1 average score of 0.61. The main factors associated with legalizing county-level recreational cannabis sales were the prevalences of past-month cannabis use and past-year cocaine use.
CONCLUSIONS: By leveraging publicly available data from 2010 to 2012, our model was able to achieve appreciable discrimination in predicting counties with legal recreational cannabis sales in 2014, however, there is room for improvement. Having demonstrated model performance in the first handful of states to legalize cannabis, additional testing with more recent data using time to event models is warranted.
摘要:
背景:在使休闲大麻合法化的州内,当地的大麻政策存在很大的地理差异。本研究开发了一种可解释的机器学习模型,该模型使用县级人口统计数据,社会政治因素,以及对药物使用和精神疾病患病率的估计,以预测美国每个县娱乐性大麻销售的合法性。
方法:我们合并了数据,并从2010年人口普查中选择了14个模型输入,2012年麻省理工学院选举实验室县总统数据,2010年至2012年国家药物使用和健康调查(NSDUH)得出的小区域估计。如果2014年允许在该县的任何地方销售休闲大麻,则县政策被标记为具有休闲大麻法律(RCL),导致92个RCL和3002个非RCL县。我们使用合成数据增强和少数过采样技术,在数据的随机子样本上构建1000个逻辑回归的集合,一次保留一个状态,并从所有剩余状态中构建模型。通过将预测的政策条件与2014年的实际结果进行比较来评估绩效。
结果:与2014年的实际RCL政策相比,向RCL过渡的县的整体估计预测的宏观f1平均得分为0.61。与县级娱乐性大麻销售合法化相关的主要因素是过去一个月大麻使用和过去一年可卡因使用的流行。
结论:通过利用2010年至2012年的公开数据,我们的模型能够在预测2014年合法休闲大麻销售的县方面实现明显的歧视,但是,还有改进的余地。在第一批将大麻合法化的州展示了模型性能,需要使用时间到事件模型对最新数据进行额外测试。
公众号