关键词: Bias Machine learning Risk assessment Violence

来  源:   DOI:10.1016/j.jbi.2024.104709

Abstract:
OBJECTIVE: Natural language processing and machine learning have the potential to lead to biased predictions. We designed a novel Automated RIsk Assessment (ARIA) machine learning algorithm that assesses risk of violence and aggression in adolescents using natural language processing of transcribed student interviews. This work evaluated the possible sources of bias in the study design and the algorithm, tested how much of a prediction was explained by demographic covariates, and investigated the misclassifications based on demographic variables.
METHODS: We recruited students 10-18 years of age and enrolled in middle or high schools in Ohio, Kentucky, Indiana, and Tennessee. The reference standard outcome was determined by a forensic psychiatrist as either a \"high\" or \"low\" risk level. ARIA used L2-regularized logistic regression to predict a risk level for each student using contextual and semantic features. We conducted three analyses: a PROBAST analysis of risk in study design; analysis of demographic variables as covariates; and a prediction analysis. Covariates were included in the linear regression analyses and comprised of race, sex, ethnicity, household education, annual household income, age at the time of visit, and utilization of public assistance.
RESULTS: We recruited 412 students from 204 schools. ARIA performed with an AUC of 0.92, sensitivity of 71%, NPV of 77%, and specificity of 95%. Of these, 387 students with complete demographic information were included in the analysis. Individual linear regressions resulted in a coefficient of determination less than 0.08 across all demographic variables. When using all demographic variables to predict ARIA\'s risk assessment score, the multiple linear regression model resulted in a coefficient of determination of 0.189. ARIA performed with a lower False Negative Rate (FNR) of 15.2% (CI [0 - 40]) for the Black subgroup and 12.7%, CI [0 - 41.4] for Other races, compared to an FNR of 26.1% (CI [14.1 - 41.8]) in the White subgroup.
CONCLUSIONS: Bias assessment is needed to address shortcomings within machine learning. In our work, student race, ethnicity, sex, use of public assistance, and annual household income did not explain ARIA\'s risk assessment score of students. ARIA will continue to be evaluated regularly with increased subject recruitment.
摘要:
目的:自然语言处理和机器学习有可能导致有偏差的预测。我们设计了一种新颖的自动RIsk评估(ARIA)机器学习算法,该算法使用转录学生访谈的自然语言处理来评估青少年的暴力和侵略风险。这项工作评估了研究设计和算法中可能的偏差来源,测试了人口统计协变量对预测的解释程度,并根据人口统计变量调查了错误分类。
方法:我们招募了10-18岁的学生,并在俄亥俄州的初中或高中就读,肯塔基,印第安纳州,和田纳西州。参考标准结果由法医精神病医生确定为“高”或“低”风险水平。ARIA使用L2正则化逻辑回归使用上下文和语义特征来预测每个学生的风险水平。我们进行了三项分析:研究设计中的风险PROBAST分析;人口统计学变量作为协变量的分析;和预测分析。协变量包括在线性回归分析中,由种族组成,性别,种族,家庭教育,家庭年收入,参观时的年龄,利用公共援助。
结果:我们从204所学校招募了412名学生。ARIA的AUC为0.92,灵敏度为71%,NPV为77%,和95%的特异性。其中,分析中包括387名具有完整人口统计信息的学生。个体线性回归导致在所有人口统计学变量中的确定系数小于0.08。当使用所有人口统计学变量来预测ARIA的风险评估评分时,多元线性回归模型的决定系数为0.189.对于Black亚组,ARIA的假阴性率(FNR)较低,为15.2%(CI[0-40])和12.7%,其他种族的CI[0-41.4],与White亚组的FNR为26.1%(CI[14.1-41.8])相比。
结论:需要偏差评估来解决机器学习中的缺点。在我们的工作中,学生种族,种族,性别,使用公共援助,和家庭年收入不能解释学生ARIA的风险评估得分。随着受试者招募的增加,将继续定期评估ARIA。
公众号