关键词: Ecological model Geospatial analysis Machine learning model SHAP Wolf XGBoost

Mesh : Animals Humans Wolves Probability Germany

来  源:   DOI:10.1007/s00267-024-01941-1

Abstract:
Wolves have returned to Germany since 2000. Numbers have grown to 209 territorial pairs in 2021. XGBoost machine learning, combined with SHAP analysis is applied to predict German wolf pair presence in 2022 for 10 × 10 km grid cells. Model input consisted of 38 variables from open sources, covering the period 2000 to 2021. The XGBoost model predicted well, with 0.91 as the AUC. SHAP analysis ranked the variables: distance to the closest neighboring wolf pair was the main driver for a grid cell to become occupied by a wolf pair. The clustering tendency of related wolves seems to be an important explanatory factor here. Second was the percentage of wooded area. The next eight variables related to wolf presence in the preceding year, except at fifth, eighth and tenth position in the total order: human density (square root) in the grid, percentage arable land and road density respectively. Other variables including the occurrence of wild prey were the weakest predictors. The SHAP analysis also provided crucial added value in identifying a variable that had threshold values where its contribution to the prediction changed from positive to negative or vice versa. For instance, low density of people increased the probability of wolf pair presence, whereas a high density decreased this probability. Cumulative lift techniques showed that the model performed almost four times better than random prediction. The combination of XGBoost, SHAP and cumulative lift techniques is new in wolf management and conservation, allowing for the focusing of educational and financial resources.
摘要:
自2000年以来,狼队已返回德国。到2021年,这一数字已增长到209个领土对。XGBoost机器学习,结合SHAP分析用于预测2022年10×10km网格单元的德国狼对的存在。模型输入由来自开源的38个变量组成,涵盖2000年至2021年期间。XGBoost模型预测得很好,用0.91作为AUC。SHAP分析对变量进行了排名:到最近的相邻狼对的距离是网格单元被狼对占据的主要驱动力。相关狼的聚类趋势似乎是这里的重要解释因素。其次是树木面积的百分比。接下来的八个变量与前一年的狼存在有关,除了第五,总顺序中的第八和第十位置:网格中的人体密度(平方根),耕地百分比和道路密度。包括野生猎物的发生在内的其他变量是最弱的预测因子。SHAP分析还为识别具有阈值的变量提供了关键的附加值,其中该变量对预测的贡献从正变为负,反之亦然。例如,低密度的人增加了狼对存在的概率,而高密度降低了这种可能性。累积提升技术表明,该模型的性能几乎是随机预测的四倍。XGBoost的组合,SHAP和累积提升技术是狼管理和保护中的新技术,允许集中教育和财政资源。
公众号