Generalizability

泛化
  • 文章类型: Journal Article
    揭示anoikis抗性在CRC中的作用对于CRC的诊断和治疗具有重要意义。本研究整合了CRC失巢凋亡相关关键基因(CRC-AKGs),建立了一种新的模型,以提高CRC预后评估的效率和准确性。
    通过差异表达和单变量Cox分析筛选出CRC-ARGs。通过LASSO机器学习算法获得CRC-AKGs,构建LASSO风险评分,结合临床预测因子构建列线图临床预测模型。并行,这项工作开发了一个基于网络的动态列线图,以促进我们模型的推广和实际应用。
    我们确定了10个CRC-AKGs,并计算了与风险相关的预后风险评分。多因素COX回归分析表明,风险评分,TNM阶段,年龄和年龄是与CRC预后显著相关的独立危险因素(p<0.05)。建立预后模型以令人满意的准确性(3年AUC=0.815)预测CRC个体的结果。网络交互式列线图(https://yuexiazhang.shinyapps.io/anosikisCRC/)显示出我们模型的强泛化性。并行,在目前的工作中发现了肿瘤微环境与风险评分之间的实质性相关性.
    这项研究揭示了anoikis在CRC中的潜在作用,并基于临床和测序数据为大肠癌的临床决策提供了新的见解。此外,交互式工具为研究人员提供了一个用户友好的界面,以输入相关临床变量,并根据我们建立的模型获得个性化的风险预测或预后评估.
    UNASSIGNED: Revealing the role of anoikis resistance plays in CRC is significant for CRC diagnosis and treatment. This study integrated the CRC anoikis-related key genes (CRC-AKGs) and established a novel model for improving the efficiency and accuracy of the prognostic evaluation of CRC.
    UNASSIGNED: CRC-ARGs were screened out by performing differential expression and univariate Cox analysis. CRC-AKGs were obtained through the LASSO machine learning algorithm and the LASSO Risk-Score was constructed to build a nomogram clinical prediction model combined with the clinical predictors. In parallel, this work developed a web-based dynamic nomogram to facilitate the generalization and practical application of our model.
    UNASSIGNED: We identified 10 CRC-AKGs and a risk-related prognostic Risk-Score was calculated. Multivariate COX regression analysis indicated that the Risk-Score, TNM stage, and age were independent risk factors that significantly associated with the CRC prognosis(p < 0.05). A prognostic model was built to predict the outcome with satisfied accuracy (3-year AUC = 0.815) for CRC individuals. The web interactive nomogram (https://yuexiaozhang.shinyapps.io/anoikisCRC/) showed strong generalizability of our model. In parallel, a substantial correlation between tumor microenvironment and Risk-Score was discovered in the present work.
    UNASSIGNED: This study reveals the potential role of anoikis in CRC and sets new insights into clinical decision-making in colorectal cancer based on both clinical and sequencing data. Also, the interactive tool provides researchers with a user-friendly interface to input relevant clinical variables and obtain personalized risk predictions or prognostic assessments based on our established model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    认知神经科学中一个常见的研究协议是训练受试者在记录大脑活动的同时进行故意设计的实验,目的是了解认知背后的大脑机制。然而,很少讨论该协议的研究结果如何应用于技术。这里,我回顾了关于大脑时间处理的研究,作为这个研究方案的例子,以及神经科学的两个主要应用领域(神经工程和大脑启发的人工智能)。时间处理是认知的一个基本维度,时间也是任何现实世界信号在技术中处理的不可或缺的维度。因此,人们可能会期望认知中时间处理的研究会对大脑相关技术产生深远的影响。令人惊讶的是,我发现认知研究对时间处理的结果对解决实际问题几乎没有帮助。这种尴尬的局面可能是由于认知研究结果缺乏概括性,在良好控制的实验室条件下,现实生活中的情况。这种普遍性的缺乏可能源于世界的根本不可知性(包括认知)。总的来说,本文对上述认知神经科学研究方案的有用性和前景进行了质疑和批评。对今后的研究提出三点建议。首先,为了提高研究的普遍性,最好是在现实生活条件下研究大脑活动,而不是在控制良好的实验室实验中。第二,为了克服世界的不可知性,我们可以设计一个容易接近的被调查对象的代理人,这样我们就可以通过在代理人上进行实验来预测被调查对象的行为。第三,论文呼吁以技术为导向的研究,目的是创造技术而不是发现知识。
    A common research protocol in cognitive neuroscience is to train subjects to perform deliberately designed experiments while recording brain activity, with the aim of understanding the brain mechanisms underlying cognition. However, how the results of this protocol of research can be applied in technology is seldom discussed. Here, I review the studies on time processing of the brain as examples of this research protocol, as well as two main application areas of neuroscience (neuroengineering and brain-inspired artificial intelligence). Time processing is a fundamental dimension of cognition, and time is also an indispensable dimension of any real-world signal to be processed in technology. Therefore, one may expect that the studies of time processing in cognition profoundly influence brain-related technology. Surprisingly, I found that the results from cognitive studies on timing processing are hardly helpful in solving practical problems. This awkward situation may be due to the lack of generalizability of the results of cognitive studies, which are under well-controlled laboratory conditions, to real-life situations. This lack of generalizability may be rooted in the fundamental unknowability of the world (including cognition). Overall, this paper questions and criticizes the usefulness and prospect of the abovementioned research protocol of cognitive neuroscience. I then give three suggestions for future research. First, to improve the generalizability of research, it is better to study brain activity under real-life conditions instead of in well-controlled laboratory experiments. Second, to overcome the unknowability of the world, we can engineer an easily accessible surrogate of the object under investigation, so that we can predict the behavior of the object under investigation by experimenting on the surrogate. Third, the paper calls for technology-oriented research, with the aim of technology creation instead of knowledge discovery.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:深度学习已被越来越多地研究用于辅助临床体外受精(IVF)。许多任务的第一个技术步骤是视觉检测和定位精子,卵母细胞,和图像中的胚胎。对于此类深度学习模型的临床部署,不同的诊所使用不同的图像采集硬件和不同的样本预处理协议,这引起了人们的担忧,即一个诊所报告的深度学习模型的准确性是否可以在另一个诊所再现。在这里,我们旨在研究每个成像因素对目标检测模型的泛化性的影响,以精子分析为例。
    方法:使用最先进的检测人类精子的模型进行消融研究,以定量评估模型精度(假阳性检测)和召回率(漏检)如何受到成像放大倍数的影响。成像模式,和样品预处理方案。结果导致了以下假设:训练数据集中图像采集条件的丰富性确定性地影响模型的可泛化性。通过首先丰富具有广泛成像条件的训练数据集来检验该假设,然后通过新样本的内部盲测试和外部多中心临床验证进行验证。
    结果:消融实验表明,从训练数据集中删除数据子集会显著降低模型精度。从训练数据集中删除原始样本图像会导致模型精度的最大下降,而删除20倍图像导致模型召回的最大下降。通过将不同的成像和样本预处理条件整合到丰富的训练数据集中,该模型的精度实现了0.97的组内相关系数(ICC)(95%CI:0.94-0.99),召回的ICC为0.97(95%CI:0.93-0.99)。多中心临床验证表明,在不同的临床和应用中,模型精度或召回率没有显着差异。
    结论:结果验证了训练数据集中数据的丰富性是影响模型泛化性的关键因素的假设。这些发现强调了训练数据集对模型评估的多样性的重要性,并建议未来在男科和生殖医学中的深度学习模型应纳入全面的特征集,以增强跨临床的普遍性。
    BACKGROUND: Deep learning has been increasingly investigated for assisting clinical in vitro fertilization (IVF). The first technical step in many tasks is to visually detect and locate sperm, oocytes, and embryos in images. For clinical deployment of such deep learning models, different clinics use different image acquisition hardware and different sample preprocessing protocols, raising the concern over whether the reported accuracy of a deep learning model by one clinic could be reproduced in another clinic. Here we aim to investigate the effect of each imaging factor on the generalizability of object detection models, using sperm analysis as a pilot example.
    METHODS: Ablation studies were performed using state-of-the-art models for detecting human sperm to quantitatively assess how model precision (false-positive detection) and recall (missed detection) were affected by imaging magnification, imaging mode, and sample preprocessing protocols. The results led to the hypothesis that the richness of image acquisition conditions in a training dataset deterministically affects model generalizability. The hypothesis was tested by first enriching the training dataset with a wide range of imaging conditions, then validated through internal blind tests on new samples and external multi-center clinical validations.
    RESULTS: Ablation experiments revealed that removing subsets of data from the training dataset significantly reduced model precision. Removing raw sample images from the training dataset caused the largest drop in model precision, whereas removing 20x images caused the largest drop in model recall. by incorporating different imaging and sample preprocessing conditions into a rich training dataset, the model achieved an intraclass correlation coefficient (ICC) of 0.97 (95% CI: 0.94-0.99) for precision, and an ICC of 0.97 (95% CI: 0.93-0.99) for recall. Multi-center clinical validation showed no significant differences in model precision or recall across different clinics and applications.
    CONCLUSIONS: The results validated the hypothesis that the richness of data in the training dataset is a key factor impacting model generalizability. These findings highlight the importance of diversity in a training dataset for model evaluation and suggest that future deep learning models in andrology and reproductive medicine should incorporate comprehensive feature sets for enhanced generalizability across clinics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    重度抑郁症(MDD)是一种严重且异质性的精神疾病,需要准确诊断。静息状态功能磁共振成像(rsfMRI),捕捉大脑结构的多个视角,函数,和连通性,越来越多的应用于MDD的诊断和病理研究。然后开发不同的机器学习算法来利用rsfMRI中的丰富信息,并将MDD患者与正常对照区分开来。尽管有报道称最近取得了进展,MDD辨别精度还有进一步改进的空间。判别方法的普遍性和可解释性也没有得到充分解决。这里,我们提出了一种机器学习方法(MFMC),通过连接多个特征和堆叠多个分类器来进行MDD区分。在包含从25个不同地点收集的2428个受试者的REST-meta-MDD数据集上测试MFMC。MFMC产生96.9%的MDD辨别准确率,证明了对现有方法的显著改进。此外,当训练和测试受试者来自独立站点时,MFMC的通用性得到了良好表现的验证。使用XGBoost作为元分类器使我们能够探索MFMC的决策过程。我们确定了与9个大脑区域相关的13个特征值,包括后扣带回,额上回眶部分,和角回,这对分类贡献最大,并且在组水平上也显示出显着差异。仅使用这13个特征值,在获取所有特征值时,就可以达到MFMC的全部性能的87%。这些特征可用作将来MDD的临床有用的诊断和预后生物标志物。
    Major depressive disorder (MDD) is a serious and heterogeneous psychiatric disorder that needs accurate diagnosis. Resting-state functional MRI (rsfMRI), which captures multiple perspectives on brain structure, function, and connectivity, is increasingly applied in the diagnosis and pathological research of MDD. Different machine learning algorithms are then developed to exploit the rich information in rsfMRI and discriminate MDD patients from normal controls. Despite recent advances reported, the MDD discrimination accuracy has room for further improvement. The generalizability and interpretability of the discrimination method are not sufficiently addressed either. Here, we propose a machine learning method (MFMC) for MDD discrimination by concatenating multiple features and stacking multiple classifiers. MFMC is tested on the REST-meta-MDD data set that contains 2428 subjects collected from 25 different sites. MFMC yields 96.9% MDD discrimination accuracy, demonstrating a significant improvement over existing methods. In addition, the generalizability of MFMC is validated by the good performance when the training and testing subjects are from independent sites. The use of XGBoost as the meta classifier allows us to probe the decision process of MFMC. We identify 13 feature values related to 9 brain regions including the posterior cingulate gyrus, superior frontal gyrus orbital part, and angular gyrus, which contribute most to the classification and also demonstrate significant differences at the group level. The use of these 13 feature values alone can reach 87% of MFMC\'s full performance when taking all feature values. These features may serve as clinically useful diagnostic and prognostic biomarkers for MDD in the future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    三维湖泊水动力模型是广泛用于评估湖泊水文条件变化的有力工具。然而,在预测大型湖泊的状态或在中小型湖泊中使用高分辨率模拟时,其计算成本会出现问题。一种可能的解决方案是采用数据驱动的仿真器,例如基于深度学习(DL)的模拟器,以取代原来的快速计算模型。然而,现有的基于DL的仿真器通常是黑盒和数据相关模型,在实际应用中导致较差的可解释性和泛化性。在这项研究中,利用深度神经网络(DNN)建立了数据驱动的模拟器,以代替原始模型,用于三维湖泊水动力学的快速计算。然后,采用Koopman算子和迁移学习(TL)来增强仿真器的可解释性和可泛化性。最后,通过线性回归和相关分析,综合分析了基于DL的仿真器的泛化性。这些方法是针对苏黎世湖(瑞士)的现有流体动力学模型进行测试的,该模型的数据由基于Web的开源平台Meteolakes/Alplakes提供。根据结果,(1)DLEDMD比DNN具有更好的可解释性,因为其Koopman算子揭示了流体动力学背后的线性结构;(2)基于DL的模拟器在三维湖泊流体动力学中的泛化受训练和测试数据之间相似性的影响;(3)TL有效地提高了基于DL的模拟器的泛化性。
    Three-dimensional lake hydrodynamic model is a powerful tool widely used to assess hydrological condition changes of lake. However, its computational cost becomes problematic when forecasting the state of large lakes or using high-resolution simulation in small-to-medium size lakes. One possible solution is to employ a data-driven emulator, such as a deep learning (DL) based emulator, to replace the original model for fast computing. However, existing DL-based emulators are often black-box and data-dependent models, causing poor interpretability and generalizability in practical applications. In this study, a data-driven emulator is established using deep neural network (DNN) to replace the original model for fast computing of three-dimensional lake hydrodynamics. Then, the Koopman operator and transfer learning (TL) are employed to enhance the interpretability and generalizability of the emulator. Finally, the generalizability of DL-based emulators is comprehensively analyzed through linear regression and correlation analysis. These methods are tested against an existing hydrodynamic model of Lake Zurich (Switzerland) whose data was provided by an open-source web-based platform called Meteolakes/Alplakes. According to the results, (1) The DLEDMD offers better interpretability than DNN because its Koopman operator reveals the linear structure behind the hydrodynamics; (2) The generalization of the DL-based emulators in three-dimensional lake hydrodynamics are influenced by the similarity between the training and testing data; (3) TL effectively improves the generalizability of the DL-based emulators.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    竞争会影响道德行为吗?几个世纪以来,这个基本问题在主要学者中一直在争论,最近,它已经在实验研究中得到了相当不确定的经验证据。同一假设的矛盾经验结果的潜在来源是各种合理的实验研究方案中真实效应大小的设计异质性变化。为了进一步证明竞争是否会影响道德行为,并研究单个实验研究的普遍性是否受到设计异质性的危害,我们邀请独立研究团队为一个众包项目贡献实验设计。在大规模的在线数据收集中,18,123名实验参与者被随机分配到95个提交的设计中的45个随机选择的实验设计。在对汇总数据的荟萃分析中,我们发现竞争对道德行为的不利影响很小。我们研究的众包设计可以清楚地识别和估计效果大小的变化,这些变化超出了采样方差的预期范围。我们发现大量的设计异质性-估计是45个研究设计的效应大小估计的平均标准误差的1.6倍-表明基于单个实验设计的结果的信息量和可概括性是有限的。在存在实质性设计异质性的情况下,对潜在假设得出强有力的结论,需要在测试相同假设的各种实验设计上朝着更大的数据收集迈进。
    Does competition affect moral behavior? This fundamental question has been debated among leading scholars for centuries, and more recently, it has been tested in experimental studies yielding a body of rather inconclusive empirical evidence. A potential source of ambivalent empirical results on the same hypothesis is design heterogeneity-variation in true effect sizes across various reasonable experimental research protocols. To provide further evidence on whether competition affects moral behavior and to examine whether the generalizability of a single experimental study is jeopardized by design heterogeneity, we invited independent research teams to contribute experimental designs to a crowd-sourced project. In a large-scale online data collection, 18,123 experimental participants were randomly allocated to 45 randomly selected experimental designs out of 95 submitted designs. We find a small adverse effect of competition on moral behavior in a meta-analysis of the pooled data. The crowd-sourced design of our study allows for a clean identification and estimation of the variation in effect sizes above and beyond what could be expected due to sampling variance. We find substantial design heterogeneity-estimated to be about 1.6 times as large as the average standard error of effect size estimates of the 45 research designs-indicating that the informativeness and generalizability of results based on a single experimental design are limited. Drawing strong conclusions about the underlying hypotheses in the presence of substantive design heterogeneity requires moving toward much larger data collections on various experimental designs testing the same hypothesis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在医学和社会科学中直接复制经验研究的合作努力表明,可复制率惊人地低,一种被称为“复制危机”的现象。可复制性差刺激了旨在提高这些学科可靠性的文化变革。鉴于生态学和进化生物学中缺乏等效的复制项目,两项相互关联的指标为回顾性评估可复制性提供了机会:发表偏倚和统计能力.本注册报告评估了小型研究的患病率和严重程度(即,较小的研究报告较大的效应大小)和下降效应(即,效应大小随着时间的推移而减少),使用87项荟萃分析,包括4,250项主要研究和17,638项效应大小。Further,我们估计出版偏差可能会如何扭曲效应大小的估计,统计能力,以及幅度误差(M型或夸张比例)和符号误差(S型)。我们显示了强有力的证据,证明了小型研究和衰退效应在生态和进化中的普遍性。发表偏倚普遍存在,导致荟萃分析均值被高估(至少)0.12个标准差。发表偏倚的普遍性扭曲了荟萃分析结果的信心,在纠正发表偏倚后,最初具有统计学意义的荟萃分析均值中有66%变得不显着。生态和进化研究始终具有较低的统计能力(15%),平均影响夸大了4倍(M型错误率=4.4)。值得注意的是,发表偏倚将功效从23%降低到15%,M型错误率从2.7增加到4.4,因为它创建了效应大小证据的非随机样本.由于发布偏差,效应大小的符号误差(S型误差)从5%增加到8%。我们的研究提供了明确的证据,表明许多已发表的生态和进化发现被夸大了。我们的结果强调了设计高功率实证研究的重要性(例如,通过协作团队科学),促进和鼓励复制研究,在荟萃分析中测试和纠正发表偏倚,采取公开透明的研究做法,例如(预)注册,数据和代码共享,透明的报告。
    Collaborative efforts to directly replicate empirical studies in the medical and social sciences have revealed alarmingly low rates of replicability, a phenomenon dubbed the \'replication crisis\'. Poor replicability has spurred cultural changes targeted at improving reliability in these disciplines. Given the absence of equivalent replication projects in ecology and evolutionary biology, two inter-related indicators offer the opportunity to retrospectively assess replicability: publication bias and statistical power. This registered report assesses the prevalence and severity of small-study (i.e., smaller studies reporting larger effect sizes) and decline effects (i.e., effect sizes decreasing over time) across ecology and evolutionary biology using 87 meta-analyses comprising 4,250 primary studies and 17,638 effect sizes. Further, we estimate how publication bias might distort the estimation of effect sizes, statistical power, and errors in magnitude (Type M or exaggeration ratio) and sign (Type S). We show strong evidence for the pervasiveness of both small-study and decline effects in ecology and evolution. There was widespread prevalence of publication bias that resulted in meta-analytic means being over-estimated by (at least) 0.12 standard deviations. The prevalence of publication bias distorted confidence in meta-analytic results, with 66% of initially statistically significant meta-analytic means becoming non-significant after correcting for publication bias. Ecological and evolutionary studies consistently had low statistical power (15%) with a 4-fold exaggeration of effects on average (Type M error rates = 4.4). Notably, publication bias reduced power from 23% to 15% and increased type M error rates from 2.7 to 4.4 because it creates a non-random sample of effect size evidence. The sign errors of effect sizes (Type S error) increased from 5% to 8% because of publication bias. Our research provides clear evidence that many published ecological and evolutionary findings are inflated. Our results highlight the importance of designing high-power empirical studies (e.g., via collaborative team science), promoting and encouraging replication studies, testing and correcting for publication bias in meta-analyses, and adopting open and transparent research practices, such as (pre)registration, data- and code-sharing, and transparent reporting.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:思维游移是一种心理现象,内部思维过程定期与外部环境脱离。在目前的研究中,我们使用卷积神经网络(CNN)训练EEG分类器,以跟踪研究中的思维游移。&#xD;方法:我们将输入从原始EEG转换为频带频率信息(功率),单次试用ERP(stERP)模式,和通道之间的连通性矩阵(基于站点间相位聚类,ISPC)。我们为来自每个EEG通道的每个输入类型训练CNN模型作为元学习器的输入模型。为了验证泛化性,我们使用离开N-参与者交叉验证(N=6),并对来自一项独立研究的数据进行了元学习器的跨研究预测.&#xD;主要结果:当前结果显示跨参与者和任务的泛化性有限。然而,在最先进的神经网络中,我们用stERPs训练的元学习者表现最好.每个输入模型到元学习器的输出的映射指示每个EEG通道的重要性。&#xD;意义:我们的研究首次尝试训练独立于学习的思维游走分类器。结果表明,这仍然具有挑战性。我们使用的堆叠神经网络设计可以轻松检查通道重要性和特征图。 .
    Objective. Mind-wandering is a mental phenomenon where the internal thought process disengages from the external environment periodically. In the current study, we trained EEG classifiers using convolutional neural networks (CNNs) to track mind-wandering across studies.Approach. We transformed the input from raw EEG to band-frequency information (power), single-trial ERP (stERP) patterns, and connectivity matrices between channels (based on inter-site phase clustering). We trained CNN models for each input type from each EEG channel as the input model for the meta-learner. To verify the generalizability, we used leave-N-participant-out cross-validations (N= 6) and tested the meta-learner on the data from an independent study for across-study predictions.Main results. The current results show limited generalizability across participants and tasks. Nevertheless, our meta-learner trained with the stERPs performed the best among the state-of-the-art neural networks. The mapping of each input model to the output of the meta-learner indicates the importance of each EEG channel.Significance. Our study makes the first attempt to train study-independent mind-wandering classifiers. The results indicate that this remains challenging. The stacking neural network design we used allows an easy inspection of channel importance and feature maps.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Review
    滑坡影响因素的空间异质性是易发性评价模型泛化性差的主要原因。本研究旨在基于SHAP(SHapley加法解释)-XGBoost(eXtremeGradientBoosting)算法构建滑坡敏感性评估模型的综合解释框架,分析滑坡影响因素的区域特征和空间异质性,并讨论了不同景观下模型泛化的异质性。首先,选取典型山地丘陵区的不同区域,构建了包含海拔、年平均降雨量,斜坡,岩性,和NDVI通过实地调查,卫星图像,和文献综述。随后,基于XGBoost算法和空间数据库,构建了滑坡易发性评价模型,并根据区域地形解释了滑坡敏感性评价模型的预测结果,地质学,和水文学使用SHAP算法。最后,该模型被推广并应用于地形相似和非常不同的地区,地质学,气象学,和植被,探索模型泛化的空间异质性。得出以下结论:滑坡的空间分布具有异质性和复杂性,各影响因子对滑坡发生的贡献具有明显的区域特征和空间异质性。滑坡敏感性评价模型的泛化性具有空间异质性,对具有相似区域特征的区域具有较好的泛化性。使用SHAP方法对XGBoost滑坡易发性评价模型进行进一步解释,可以定量分析由于空间异质性,各种因素对灾害的贡献有多大的差异,从全球和地方评估单位的角度来看。总之,基于SHAP-XGBoost模型的综合解释框架可以量化影响因素对全球和地方滑坡发生的贡献,有利于不同区域滑坡易发性影响因素体系的构建和完善。它还可以为预测潜在滑坡灾害易发区和可解释人工智能(XAI)研究提供参考。
    The spatial heterogeneity of landslide influencing factors is the main reason for the poor generalizability of the susceptibility evaluation model. This study aimed to construct a comprehensive explanatory framework for landslide susceptibility evaluation models based on the SHAP (SHapley Additive explanation)-XGBoost (eXtreme Gradient Boosting) algorithm, analyze the regional characteristics and spatial heterogeneity of landslide influencing factors, and discuss the heterogeneity of the generalizability of the models under different landscapes. Firstly, we selected different regions in typical mountainous hilly region and constructed a geospatial database containing 12 landslide influencing factors such as elevation, annual average rainfall, slope, lithology, and NDVI through field surveys, satellite images, and a literature review. Subsequently, the landslide susceptibility evaluation model was constructed based on the XGBoost algorithm and spatial database, and the prediction results of the landslide susceptibility evaluation model were explained based on regional topography, geology, and hydrology using the SHAP algorithm. Finally, the model was generalized and applied to regions with both similar and very different topography, geology, meteorology, and vegetation, to explore the spatial heterogeneity of the generalizability of the model. The following conclusions were drawn: the spatial distribution of landslides is heterogeneous and complex, and the contribution of each influencing factor on the occurrence of landslides has obvious regional characteristics and spatial heterogeneity. The generalizability of the landslide susceptibility evaluation model is spatially heterogeneous and has better generalizability to regions with similar regional characteristics. Further explanation of the XGBoost landslide susceptibility evaluation model using the SHAP method allows quantitative analysis of the differences in how much various factors contribute to disasters due to spatial heterogeneity, from the perspective of global and local evaluation units. In summary, the integrated explanatory framework based on the SHAP-XGBoost model can quantify the contribution of influencing factors on landslide occurrence at both global and local levels, which is conducive to the construction and improvement of the influencing factor system of landslide susceptibility in different regions. It can also provide a reference for predicting potential landslide hazard-prone areas and for Explainable Artificial Intelligence (XAI) research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号