■有效探索几个关键因素对出租车司机交通违法行为的影响,为交通管理部门提供科学决策,以减少交通伤亡。
■关于南昌市出租车司机交通违法行为的43,458份电子执法数据,江西省,中国,2020年7月1日至2021年6月30日,分别探讨了交通违法特征。采用随机森林算法预测出租车司机交通违法的严重程度和影响交通违法的11个因素,包括时间,路况,环境,和出租车公司使用Shapley附加解释(SHAP)框架进行了分析。
■首先,集成方法平衡Bagging分类器(BBC)被用来平衡数据集。结果表明,原始不平衡数据集的不平衡率(IR)从6.61%降低到2.60%。此外,利用随机森林建立了出租车司机交通违法严重程度的预测模型,结果表明,准确性,m_F1,m_G-平均值,m_AUC,m_AP分别为0.877、0.849、0.599、0.976和0.957。与决策树算法相比,XGBoost,AdaBoost,和神经网络,基于随机森林的预测模型的性能指标最好。最后,SHAP框架用于提高模型的可解释性,并确定影响出租车司机交通违法行为的重要因素。结果表明,功能区,违规地点,发现和道路等级对交通违规的可能性有很大影响;它们的平均SHAP值分别为0.39、0.36和0.26。
■本文的研究结果可能有助于发现影响因素与交通违法严重程度之间的关系,为减少出租车司机的交通违法行为,提高道路安全管理水平提供理论依据。
To effectively explore the impacts of several key factors on taxi drivers\' traffic violations and provide traffic management departments with scientific decisions to reduce traffic fatalities and injuries.
43,458 electronic enforcement data about taxi drivers\' traffic violations in Nanchang City, Jiangxi Province, China, from July 1, 2020, to June 30, 2021, were utilized to explore the characteristics of traffic violations. A random forest algorithm was used to predict the severity of taxi drivers\' traffic violations and 11 factors affecting traffic violations, including time, road conditions, environment, and taxi companies were analyzed using the Shapley Additionality Explanation (
SHAP) framework.
Firstly, the ensemble method Balanced Bagging Classifier (BBC) was applied to balance the dataset. The results showed that the imbalance ratio (IR) of the original imbalanced dataset reduced from 6.61% to 2.60%. Moreover, a prediction model for the severity of taxi drivers\' traffic violations was established by using the Random Forest, and the results showed that accuracy, m_F1, m_G-mean, m_AUC, and m_AP obtained 0.877, 0.849, 0.599, 0.976, and 0.957, respectively. Compared with the algorithms of Decision Tree, XG Boost, Ada Boost, and Neural Network, the performance measures of the prediction model based on Random Forest were the best. Finally, the
SHAP framework was used to improve the interpretability of the model and identify important factors affecting taxi drivers\' traffic violations. The results showed that functional districts, location of the violation, and road grade were found to have a high impact on the probability of traffic violations; their mean
SHAP values were 0.39, 0.36, and 0.26, respectively.
Findings of this paper may help to discover the relationship between the influencing factors and the severity of traffic violations, and provide a theoretical basis for reducing the traffic violations of taxi drivers and improving the road safety management.