%0 Journal Article %T Investigation of bias in the automated assessment of school violence. %A Kanbar LJ %A Mishra A %A Osborn A %A Cifuentes A %A Combs J %A Sorter M %A Barzman D %A Dexheimer JW %J J Biomed Inform %V 157 %N 0 %D 2024 Aug 15 %M 39153563 %F 8 %R 10.1016/j.jbi.2024.104709 %X OBJECTIVE: Natural language processing and machine learning have the potential to lead to biased predictions. We designed a novel Automated RIsk Assessment (ARIA) machine learning algorithm that assesses risk of violence and aggression in adolescents using natural language processing of transcribed student interviews. This work evaluated the possible sources of bias in the study design and the algorithm, tested how much of a prediction was explained by demographic covariates, and investigated the misclassifications based on demographic variables.
METHODS: We recruited students 10-18 years of age and enrolled in middle or high schools in Ohio, Kentucky, Indiana, and Tennessee. The reference standard outcome was determined by a forensic psychiatrist as either a "high" or "low" risk level. ARIA used L2-regularized logistic regression to predict a risk level for each student using contextual and semantic features. We conducted three analyses: a PROBAST analysis of risk in study design; analysis of demographic variables as covariates; and a prediction analysis. Covariates were included in the linear regression analyses and comprised of race, sex, ethnicity, household education, annual household income, age at the time of visit, and utilization of public assistance.
RESULTS: We recruited 412 students from 204 schools. ARIA performed with an AUC of 0.92, sensitivity of 71%, NPV of 77%, and specificity of 95%. Of these, 387 students with complete demographic information were included in the analysis. Individual linear regressions resulted in a coefficient of determination less than 0.08 across all demographic variables. When using all demographic variables to predict ARIA's risk assessment score, the multiple linear regression model resulted in a coefficient of determination of 0.189. ARIA performed with a lower False Negative Rate (FNR) of 15.2% (CI [0 - 40]) for the Black subgroup and 12.7%, CI [0 - 41.4] for Other races, compared to an FNR of 26.1% (CI [14.1 - 41.8]) in the White subgroup.
CONCLUSIONS: Bias assessment is needed to address shortcomings within machine learning. In our work, student race, ethnicity, sex, use of public assistance, and annual household income did not explain ARIA's risk assessment score of students. ARIA will continue to be evaluated regularly with increased subject recruitment.