变量重要性 variable importance-医云文献数字医云科研云海量医学决策数据服务

variable importance 关注

变量重要性

文献(99篇)

百科

视频

1 Predicting Short Time-to-Crime Guns: a Machine Learning Analysis of California Transaction Records (2010-2021).

预测犯罪时间短：加利福尼亚州交易记录的机器学习分析 (2010 - 2021) 。影响指数 : 5.801
发表时间：Sep 2024 5
来源期刊：J Urban Health PMID：39235727

DOI：10.1007/s11524-024-00909-0
文章类型： Journal Article

与枪支有关的犯罪仍然是美国各城市紧急的公共健康和安全问题。一个关键问题是：枪支如何从合法零售市场转移到枪支犯罪者手中？在加利福尼亚州（2010-2020年），近800万条合法枪支交易记录与超过380,000条回收的犯罪枪支记录（2010-2021年）相关联，我们使用监督机器学习来预测购买后不久哪些枪支被用于犯罪。具体来说,使用随机森林(RF)和分层欠采样，我们预测一年内任何犯罪枪支恢复（交易的0.2％）和一年内暴力犯罪枪支恢复（交易的0.03％）。我们还确定了购买者，火器,和经销商特征最能预测使用ShapleyAdditiveExplanations的这种短时间内的犯罪枪支回收，并平均降低准确性可变重要性度量。总的来说,我们的模型表现出很好的鉴别力，我们能够识别转移到罪犯手中的极端危险的枪支。两种型号的测试集AUC均为0.85。对于预测任何恢复的模型，默认阈值0.50的灵敏度为0.63,特异性为0.88.在被认定为风险极高的交易中，例如，得分在0.98及以上的交易，74％（测试数据中的35/47）在一年内恢复。最重要的预测特征包括购买者年龄和口径大小。这项研究表明，交易记录与机器学习相结合，可以在购买后不久识别出被转移和犯罪使用风险最高的枪支。
Gun-related crime continues to be an urgent public health and safety problem in cities across the US. A key question is: how are firearms diverted from the legal retail market into the hands of gun offenders? With close to 8 million legal firearm transaction records in California (2010-2020) linked to over 380,000 records of recovered crime guns (2010-2021), we employ supervised machine learning to predict which firearms are used in crimes shortly after purchase. Specifically, using random forest (RF) with stratified under-sampling, we predict any crime gun recovery within a year (0.2% of transactions) and violent crime gun recovery within a year (0.03% of transactions). We also identify the purchaser, firearm, and dealer characteristics most predictive of this short time-to-crime gun recovery using SHapley Additive exPlanations and mean decrease in accuracy variable importance measures. Overall, our models show good discrimination, and we are able to identify firearms at extreme risk for diversion into criminal hands. The test set AUC is 0.85 for both models. For the model predicting any recovery, a default threshold of 0.50 results in a sensitivity of 0.63 and a specificity of 0.88. Among transactions identified as extremely risky, e.g., transactions with a score of 0.98 and above, 74% (35/47 in the test data) are recovered within a year. The most important predictive features include purchaser age and caliber size. This study suggests the potential utility of transaction records combined with machine learning to identify firearms at the highest risk for diversion and criminal use soon after purchase.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
2 Considerations and targeted approaches to identifying bad actors in exposure mixtures.

识别暴露混合物中不良行为者的考虑因素和有针对性的方法。影响指数 : 暂无
发表时间：Jun 2024
来源期刊：Stat Biosci PMID：39220676

DOI：10.1007/s12561-023-09409-2
文章类型： Journal Article

可变重要性是暴露混合物中的一个关键统计问题，因为它允许将暴露作为干预的潜在目标进行排序，并有助于识别混合物中的不良行为者。在混合物具有许多成分或成分间相关性高的环境中，重要性估计器可能会有偏差或高方差。目前评估可变重要性的方法有很大的局限性，包括对过于强烈或不正确的约束或假设的依赖，过度的模型外推，或者解释性差，尤其是实际意义。我们试图通过应用建立的双重稳健，基于机器学习的混合上下文中估计变量重要性的方法。这种方法减少了模型外推，适当地控制混杂，并提供了可解释性和模型灵活性。我们通过评估端粒长度之间的关系来说明它的使用，衡量生物老化，和暴露于多氯联苯（PCB）的混合物，二恶英,来自国家健康和营养检查调查（NHANES）的979名美国成年人中的呋喃和呋喃。与混合物的标准方法相反，我们的方法选择PCB180和PCB194作为端粒长度的重要贡献者。我们假设这种差异可能是由于依赖于变量选择的标准方法中的残差混杂所致。需要对这种方法进行进一步的实证评估，但它是一个有希望的工具，在一个混合物中寻找不良行为者。
Variable importance is a key statistical issue in exposure mixtures, as it allows a ranking of exposures as potential targets for intervention, and helps to identify bad actors within a mixture. In settings where mixtures have many constituents or high between-constituent correlations, estimators of importance can be subject to bias or high variance. Current approaches to assessing variable importance have major limitations, including reliance on overly strong or incorrect constraints or assumptions, excessive model extrapolation, or poor interpretability, especially regarding practical significance. We sought to overcome these limitations by applying an established doubly-robust, machine learning-based approach to estimating variable importance in a mixtures context. This method reduces model extrapolation, appropriately controls confounding, and provides both interpretability and model flexibility. We illustrate its use with an evaluation of the relationship between telomere length, a measure of biologic aging, and exposure to a mixture of polychlorinated biphenyls (PCBs), dioxins, and furans among 979 US adults from the National Health and Nutrition Examination Survey (NHANES). In contrast with standard approaches for mixtures, our approach selected PCB 180 and PCB 194 as important contributors to telomere length. We hypothesize that this difference could be due to residual confounding in standard methods that rely on variable selection. Further empirical evaluation of this method is needed, but it is a promising tool in the search for bad actors within a mixture.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
3 Accelerated and Interpretable Oblique Random Survival Forests.

加速和可解释的倾斜随机生存森林。影响指数 : 1.884
发表时间：2024
来源期刊：J Comput Graph Stat PMID：39184344

DOI：10.1080/10618600.2023.2231048
文章类型： Journal Article

倾斜随机生存森林（RSF）是一种用于权利审查结果的集成监督学习方法。倾斜RSF中的树木使用预测因子的线性组合来生长，而在标准RSF中，使用单个预测器。倾斜RSF合奏具有较高的预测精度，但是评估预测因子的许多线性组合会导致很高的计算开销。此外,已经开发了很少的方法来估计具有倾斜RSF的变量重要性(VI)。我们介绍了一种提高倾斜RSF计算效率的方法以及一种使用倾斜RSF估计VI的方法。我们的计算方法在每个非叶节点中使用牛顿-拉夫森评分，我们通过对线性组合中给定预测器使用的每个系数求反来估计VI，然后计算出包精度的降低。在基准实验中，我们发现倾斜RSF的实施速度快了数百倍，具有等效的预测精度，与现有的斜向RSF软件相比。我们在模拟研究中发现，“否定VI”比置换VI更准确地区分相关和不相关的数字预测因子，ShapleyVI,以及一种使用方差分析来测量VI的技术。当前研究中的所有斜RSF方法均可在主动脉R包中获得，和其他补充材料可在线获得。
The oblique random survival forest (RSF) is an ensemble supervised learning method for right-censored outcomes. Trees in the oblique RSF are grown using linear combinations of predictors, whereas in the standard RSF, a single predictor is used. Oblique RSF ensembles have high prediction accuracy, but assessing many linear combinations of predictors induces high computational overhead. In addition, few methods have been developed for estimation of variable importance (VI) with oblique RSFs. We introduce a method to increase computational efficiency of the oblique RSF and a method to estimate VI with the oblique RSF. Our computational approach uses Newton-Raphson scoring in each non-leaf node, We estimate VI by negating each coefficient used for a given predictor in linear combinations, and then computing the reduction in out-of-bag accuracy. In benchmarking experiments, we find our implementation of the oblique RSF is hundreds of times faster, with equivalent prediction accuracy, compared to existing software for oblique RSFs. We find in simulation studies that \"negation VI\" discriminates between relevant and irrelevant numeric predictors more accurately than permutation VI, Shapley VI, and a technique to measure VI using analysis of variance. All oblique RSF methods in the current study are available in the aorsf R package, and additional supplemental materials are available online.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
4 A novel framework for crash frequency prediction: Geographic support vector regression based on agent-based activity models in Greater Melbourne.

碰撞频率预测的新框架：大墨尔本地区基于代理的活动模型的地理支持向量回归。影响指数 : 6.376
发表时间：Nov 2024 19
来源期刊：Accid Anal Prev PMID：39163666

DOI：10.1016/j.aap.2024.107747
文章类型： Journal Article

交通碰撞研究中的空间分析领域通常可以通过解决碰撞数据中固有的空间依赖性和异质性来提高预测性能。本研究引入了地理支持向量回归(GSVR)框架,它结合了生成的距离矩阵，评估空间变化并评估各种因素的影响，包括交通,基础设施,社会人口统计学，旅行需求，和土地利用，关于大墨尔本地区的全伤和致命或严重伤害(FSI)事故的发生率。利用墨尔本基于活动的模型(MABM)的数据，该研究检查了与高峰时段交通和各种通勤方式相关的50项指标，详细分析了影响道路安全的多方面因素。研究表明，步行和骑自行车等主动交通方式成为重要指标，反映了安全方面的差异，加剧了这些道路使用者的脆弱性。相比之下,汽车通勤，虽然是撞车风险的一致因素，影响相对较小，指出道路环境固有的不平衡。这可以解释为不同类型的道路使用者之间风险和安全措施的不平等分布。与汽车驾驶员相比，基础设施和政策可能无法充分满足行人和骑自行车者的需求和脆弱性。公共交通通常提供更安全的旅行，然而，市中心地区火车站和电车站附近的相关风险不容忽视。电车停止深刻影响这些地区的总撞车事故，而在更广阔的都市区，交叉路口对FSI的影响更大。该研究还揭示了土地利用组合在影响FSI与总崩溃方面的对比作用。提出的框架提出了一种方法，用于动态提取针对特定数据集定制的不同大小的距离矩阵，提供了一种将空间影响纳入机器学习模型开发的新方法。此外,该框架扩展了特征选择技术，以增强通常缺乏全面特征选择功能的机器学习模型。
The field of spatial analysis in traffic crash studies can often enhance predictive performance by addressing the inherent spatial dependence and heterogeneity in crash data. This research introduces the Geographical Support Vector Regression (GSVR) framework, which incorporates generated distance matrices, to assess spatial variations and evaluate the influence of a wide range of factors, including traffic, infrastructure, socio-demographic, travel demand, and land use, on the incidence of total and fatal-or-serious injury (FSI) crashes across Greater Melbourne\'s zones. Utilizing data from the Melbourne Activity-Based Model (MABM), the study examines 50 indicators related to peak hour traffic and various commuting modes, offering a detailed analysis of the multifaceted factors affecting road safety. The study shows that active transportation modes such as walking and cycling emerge as significant indicators, reflecting a disparity in safety that heightens the vulnerability of these road users. In contrast, car commuting, while a consistent factor in crash risks, has a comparatively lower impact, pointing to an inherent imbalance in the road environment. This could be interpreted as an unequal distribution of risk and safety measures among different types of road users, where the infrastructure and policies may not adequately address the needs and vulnerabilities of pedestrians and cyclists compared to those of car drivers. Public transportation generally offers safer travel, yet associated risks near train stations and tram stops in city center areas cannot be overlooked. Tram stops profoundly affect total crashes in these areas, while intersection counts more significantly impact FSI crashes in the broader metropolitan area. The study also uncovers the contrasting roles of land use mix in influencing FSI versus total crashes. The proposed framework presents an approach for dynamically extracting distance matrices of varying sizes tailored to the specific dataset, providing a fresh method to incorporate spatial impacts into the development of machine learning models. Additionally, the framework extends a feature selection technique to enhance machine learning models that typically lack comprehensive feature selection capabilities.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
5 Development of American Association for Thoracic Surgery Quality Gateway outcome models, analytics, and visualizations for quality assurance.

美国胸外科协会质量网关结果模型的发展，分析,和质量保证的可视化。影响指数 : 6.439
发表时间：Jul 2024 26
来源期刊：J Thorac Cardiovasc Surg PMID：39069119

DOI：10.1016/j.jtcvs.2024.07.033
文章类型： Journal Article

目的：为成人心脏手术后的手术结果建立全面的质量保证模型。
方法：基于在3个高性能医院系统的19家医院中进行的52,792例成人心脏手术，建立手术死亡率模型(n=1,271)，中风（n=895），胸骨深部伤口感染（n=122），长时间插管（6,182），肾功能衰竭（1,265），术后住院时间延长（n=5,418），和再操作(n=1693)。随机森林分位数分类，一种为罕见事件的挑战量身定制的方法，和无模型变量优先级筛选用于确定事件的预测因子.
结果：一组术前变量足以模拟几乎所有心脏手术的手术结果，包括老年；晚期症状；左心室，肺,肾,和肝功能障碍；较低的白蛋白；较高的敏锐度；计划手术的复杂性更高。几何平均性能范围从.63到.76。校准覆盖了大范围的概率。持续的风险因素提供了很高的信息含量，它们与结果的关联用部分图可视化。这些风险因素在医院的强度和配置上有所不同，根据虚拟(数字)双胞胎框架内的反事实因果推断确定的患者风险,他们的风险调整结局也是如此.
结论：使用一小部分变量和现代机器学习方法，基于3个示例性医院系统的数据,开发了成人心脏手术后手术死亡率和主要发病率的综合模型.他们提供外科医生，他们的病人,与这些先进的医院系统相比，医院和医院系统具有21世纪的工具，用于评估其风险并提高心脏手术质量。
OBJECTIVE: The study objective was to develop comprehensive quality assurance models for procedural outcomes after adult cardiac surgery.
METHODS: Based on 52,792 cardiac operations in adults performed in 19 hospitals of 3 high-performing hospital systems, models were developed for operative mortality (n = 1271), stroke (n = 895), deep sternal wound infection (n = 122), prolonged intubation (6182), renal failure (1265), prolonged postoperative stay (n = 5418), and reoperations (n = 1693). Random forest quantile classification, a method tailored for challenges of rare events, and model-free variable priority screening were used to identify predictors of events.
RESULTS: A small set of preoperative variables was sufficient to model procedural outcomes for virtually all cardiac operations, including older age; advanced symptoms; left ventricular, pulmonary, renal, and hepatic dysfunction; lower albumin; higher acuity; and greater complexity of the planned operation. Geometric mean performance ranged from .63 to .76. Calibration covered large areas of probability. Continuous risk factors provided high information content, and their association with outcomes was visualized with partial plots. These risk factors differed in strength and configuration among hospitals, as did their risk-adjusted outcomes according to patient risk as determined by counterfactual causal inference within a framework of virtual (digital) twins.
CONCLUSIONS: By using a small set of variables and contemporary machine-learning methods, comprehensive models for procedural operative mortality and major morbidity after adult cardiac surgery were developed based on data from 3 exemplary hospital systems. They provide surgeons, their patients, and hospital systems with 21st century tools for assessing their risks compared with these advanced hospital systems and improving cardiac surgery quality.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
6 Linked shrinkage to improve estimation of interaction effects in regression models.

关联收缩，以改善回归模型中交互效应的估计。影响指数 : 暂无
发表时间：Jan 2024
来源期刊：Epidemiol Methods PMID：38989109

DOI：10.1515/em-2023-0039
文章类型： Journal Article

■添加双向交互是统计学中的经典问题，并伴随着二次增加维度的挑战。我们的目标是a）设计一种可以应对这一挑战的估计方法，b）通过开发用于量化变量重要性的计算工具来帮助解释所得模型。
■现有策略通常通过仅允许相关主要效应之间的相互作用来克服维度问题。在这种哲学的基础上，并针对具有适度n-p比率的设置，我们建立了一个局部收缩模型，将相互作用效应的收缩与它们相应的主效应的收缩联系起来。此外,我们推导了Shapley值的一个新的解析公式，这允许快速评估个体特定变量重要性评分及其不确定性。
■我们凭经验证明，我们的方法提供了对模型参数的准确估计和非常有竞争力的预测准确性。在我们的贝叶斯框架中，估计本身就伴随着推理，这有助于变量选择。提供与主要竞争对手的比较。大规模队列数据用于提供现实的插图和评估。我们的方法在RStan中的实现相对简单和灵活，允许适应特定需求。
■我们的方法是处理流行病学和/或临床研究中相互作用的现有策略的一种有吸引力的替代方法。由于其链接的局部收缩可以提高参数精度，预测和变量选择。此外，它提供了适当的推论和解释，并且可能在预测方面与解释性较低的机器学习者竞争得很好。
UNASSIGNED: The addition of two-way interactions is a classic problem in statistics, and comes with the challenge of quadratically increasing dimension. We aim to a) devise an estimation method that can handle this challenge and b) to aid interpretation of the resulting model by developing computational tools for quantifying variable importance.
UNASSIGNED: Existing strategies typically overcome the dimensionality problem by only allowing interactions between relevant main effects. Building on this philosophy, and aiming for settings with moderate n to p ratio, we develop a local shrinkage model that links the shrinkage of interaction effects to the shrinkage of their corresponding main effects. In addition, we derive a new analytical formula for the Shapley value, which allows rapid assessment of individual-specific variable importance scores and their uncertainties.
UNASSIGNED: We empirically demonstrate that our approach provides accurate estimates of the model parameters and very competitive predictive accuracy. In our Bayesian framework, estimation inherently comes with inference, which facilitates variable selection. Comparisons with key competitors are provided. Large-scale cohort data are used to provide realistic illustrations and evaluations. The implementation of our method in RStan is relatively straightforward and flexible, allowing for adaptation to specific needs.
UNASSIGNED: Our method is an attractive alternative for existing strategies to handle interactions in epidemiological and/or clinical studies, as its linked local shrinkage can improve parameter accuracy, prediction and variable selection. Moreover, it provides appropriate inference and interpretation, and may compete well with less interpretable machine learners in terms of prediction.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
7 Non-linear response of Norway spruce to climate variation along elevational and age gradients in the Carpathians.

挪威云杉对喀尔巴斯山脉海拔和年龄梯度气候变化的非线性响应。影响指数 : 8.431
发表时间：Jul 2024 1
来源期刊：Environ Res PMID：38710428

DOI：10.1016/j.envres.2024.119073
文章类型： Journal Article

气候变化,即增加的变暖加上极端事件的增加(例如，干旱,风暴,热浪)，正在对全世界的森林生态系统产生负面影响。在这些生态系统中,生长动态和生物量积累主要受环境约束驱动，树间竞争，和骚乱制度。通常,由于建模的简单性和直截了当，气候与生长的关系是通过线性相关来评估的。然而,应用这种方法可能会产生结果偏差，由于树木对环境因素的生态和生理响应是非线性的，通常是钟形的。在东喀尔巴休斯,挪威云杉位于其自然发生的最东南边缘；因此，该地区可能容易受到气候变化的影响。在这项研究之前，尚未使用机器学习技术对挪威云杉的气候-生长关系进行非线性评估。为了解决这个知识差距，我们分析了158个林分的大型树环网络，有3000多棵不同年龄的树木沿海拔梯度分布。我们的结果表明，云杉生长-气候响应的非线性是特定季节的：前一个秋天和当前生长季节的温度，随着冬季的水供应，引起钟形反应。此外，我们发现在低海拔地区,云杉的生长主要受到生长季节水分供应的限制，而冬季温度可能会对整个海拔梯度产生轻微影响。此外，在海拔低于1400米时，还发现云杉树对以前的秋季水供应敏感。总的来说,我们的研究结果为挪威云杉对喀尔巴厘气候的反应提供了新的启示，这可能有助于管理决策。
Climate change, namely increased warming coupled with a rise in extreme events (e.g., droughts, storms, heatwaves), is negatively affecting forest ecosystems worldwide. In these ecosystems, growth dynamics and biomass accumulation are driven mainly by environmental constraints, inter-tree competition, and disturbance regimes. Usually, climate-growth relationships are assessed by linear correlation due to the simplicity and straightforwardness of modeling. However, applying this method may bias results, since the ecological and physiological responses of trees to environmental factors are non-linear, and usually bell-shaped. In the Eastern Carpathian, Norway spruce is at the southeasternmost edge of its natural occurrence; this region is thus potentially vulnerable to climate change. A non-linear assessment of climate-growth relationships using machine-learning techniques for Norway spruce in this area had not been conducted prior to this study. To address this knowledge gap, we analyzed a large tree-ring network from 158 stands, with over 3000 trees of varying age distributed along an elevational gradient. Our results showed that non-linearity in the growth-climate response of spruce was season-specific: temperatures from the previous autumn and current growing season, along with water availability during winter, induced a bell-shaped response. Moreover, we found that at low elevations, spruce growth was mainly limited by water availability in the growing season, while winter temperatures are likely to have had a slight influence along the entire elevational gradient. Furthermore, at elevations lower than 1400 m, spruce trees were also found to be sensitive to previous autumn water availability. Overall, our results shed new light on the response of Norway spruce to climate in the Carpathians, which may aid in management decisions.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
8 Insight into the correlation of key taste substances and key volatile substances from shrimp heads at different temperatures.

深入了解不同温度下虾头关键味觉物质和关键挥发性物质的相关性。影响指数 : 9.231
发表时间：Aug 2024 30
来源期刊：Food Chem PMID：38688226

DOI：10.1016/j.foodchem.2024.139150
文章类型： Journal Article

这项研究旨在研究在20°C下储存的虾头的味道物质，4°C,-3°C,和-18°C，以及味觉物质与25种关键挥发性物质之间的相关性。值得注意的是,在20°C下储存的样品显示出苦味氨基酸和次黄嘌呤的显着变化，并迅速恶化。在4°C下储存14d或-3°C下储存30d的样品促进了鲜味氨基酸的发展，甜氨基酸，IMP。此外，在-18°C下储存30d的样品显示味道特征没有显著变化。通过定量分析的味觉物质的变化与通过电子舌分析的味觉谱的变化一致。根据O2PLS(VIP>1)的结果，Cys,Arg,Glu,Ser,Val,阿拉,Ile,ADP,IMP与25种关键挥发性物质相关。这项研究为存储提供了基础数据，交通运输,和虾头的增值利用。
This study aimed to investigate taste substances of shrimp heads stored at 20 °C, 4 °C, -3 °C, and - 18 °C, and the correlation between taste substances and 25 key volatile substances. Notably, samples stored at 20 °C showed significant changes in bitter amino acids and hypoxanthine, and quickly deteriorated. Samples stored at 4 °C for 14 d or - 3 °C for 30 d facilitated the development of umami amino acids, sweet amino acids, and IMP. Furthermore, samples stored at -18 °C for 30 d demonstrated no significant changes in taste profile. Changes in taste substances through quantitative analysis were consistent with changes in taste profile through e-tongue analysis. Based on the results of O2PLS (VIP > 1), Cys, Arg, Glu, Ser, Val, Ala, Ile, ADP, and IMP were correlated with 25 key volatile substances. This study provides fundamental data for the storage, transportation, and value-added utilization of shrimp heads.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
9 Exploring cranial macromorphoscopic variation and classification accuracy in a South African sample.

探索南非样本中的颅骨宏观形态变化和分类准确性。影响指数 : 2.791
发表时间：Sep 2024 16
来源期刊：Int J Legal Med PMID：38622313

DOI：10.1007/s00414-024-03230-2
文章类型： Journal Article

迄今为止，南非法医人类学家在从骨骼遗骸构建生物学特征时，只能成功地应用度量方法来估计人口亲和力。虽然是非度量标准，或者宏观形态方法存在，已经进行了有限的研究，以探索其在南非人口中的使用。这项研究旨在探索17个颅骨宏观形态特征，以开发改进的方法来估计黑人之间的种群亲和力。白色和彩色南非人，该方法符合最佳实践标准。性状频率分布揭示了大量的群体变异和重叠，而不是一个单一的特征可以被认为是任何一个群体的特征。Kruskal-Wallis和Dunn\'s检验显示了17个性状中13个的显著群体差异。随机森林模型用于开发分类模型，以评估特征在识别种群亲和力方面的可靠性和准确性。总的来说,在评估群体亲和力时，包括所有性状的模型获得了79%的分类准确率，这与目前的颅骨测量方法相当。变量重要性表示所有特征都为模型贡献了一些信息，下鼻缘，鼻骨轮廓,和鼻孔形状排名最有用的分类。因此，这项研究验证了南非样本中宏观形态特征的使用，来自本研究的特定人群数据可能会被纳入南非的法医案例和骨骼分析中，以改善人群亲和力估计。
To date South African forensic anthropologists are only able to successfully apply a metric approach to estimate population affinity when constructing a biological profile from skeletal remains. While a non-metric, or macromorphoscopic approach exists, limited research has been conducted to explore its use in a South African population. This study aimed to explore 17 cranial macromorphoscopic traits to develop improved methodology for the estimation of population affinity among black, white and coloured South Africans and for the method to be compliant with standards of best practice. The trait frequency distributions revealed substantial group variation and overlap, and not a single trait can be considered characteristic of any one population group. Kruskal-Wallis and Dunn\'s tests demonstrated significant population differences for 13 of the 17 traits. Random forest modelling was used to develop classification models to assess the reliability and accuracy of the traits in identifying population affinity. Overall, the model including all traits obtained a classification accuracy of 79% when assessing population affinity, which is comparable to current craniometric methods. The variable importance indicates that all the traits contributed some information to the model, with the inferior nasal margin, nasal bone contour, and nasal aperture shape ranked the most useful for classification. Thus, this study validates the use of macromorphoscopic traits in a South African sample, and the population-specific data from this study can potentially be incorporated into forensic casework and skeletal analyses in South Africa to improve population affinity estimates.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 Evaluating postcranial macromorphoscopic traits to estimate population variation among modern South Africans.

评估颅后宏观形态特征以估计现代南非人的种群变化。影响指数 : 2.676
发表时间：Mar 2024 7
来源期刊：Forensic Sci Int PMID：38382241

DOI：10.1016/j.forsciint.2024.111954
文章类型： Journal Article

已经在全球范围内观察到人口重叠以及人口内部和人口之间的差异，但通常很难量化。为了实现这一点，许多不同的方法需要探索和验证，以帮助创建一个准确的生物学概况。当前缺乏有关颅后宏观形态特征的数据库，这表明需要进一步研究该方法是否可以在法医环境中重复使用。本研究旨在评估南非样本中11种颅后宏观形态特征的患病率。总共271名成年黑人后颅骨，colored,对南非白人进行了评估。观察者内部和观察者之间的协议范围从公平到几乎完美，除了C1的副横孔，观察者之间的协议较差。在至少两个组之间，只有七个性状显着不同。创建单变量和多变量随机森林模型以测试性状对群体亲和力进行分类的阳性预测性能。单变量模型的分类精度范围为33.3％至53.0％，多变量模型的分类精度范围为54.6％至62.1％。基于变量的重要性，评估棘突分叉的特征是最具歧视性的变量。结果表明，颅后MMS方法并不优于当前用于估计种群亲和力的方法。需要进一步研究该方法才能在南非的法医学案例工作中具有实际适用性。
Population overlap and the variation within and among populations have been globally observed but is often difficult to quantify. To achieve this, numerous different methods need to be explored and validated to assist with the creation of an accurate biological profile. The current lack of databases for postcranial macromorphoscopic traits indicates the need to further investigate if the method can be employed repeatably in a forensic context. The current study aimed to assess the prevalence of eleven postcranial macromorphoscopic traits in a South African sample. A total of 271 postcrania of adult black, coloured, and white South Africans were assessed. The intra- and inter-observer agreement ranged from fair to almost perfect except for the accessory transverse foramen of C1, which had poor agreement between observers. Only seven traits differed significantly between at least two of the groups. Univariate and multivariate random forest models were created to test the positive predictive performance of the traits to classify population affinity. The classification accuracies for the univariate models ranged from 33.3% to 53.0% and ranged from 54.6% to 62.1% for the multivariate models. Based on the variable importance, the traits assessing spinous process bifurcation were the most discriminatory variables. The results indicate that the postcranial MMS approach does not outperform current methods employed to estimate population affinity. Further research needs to be done for the method to have practical applicability for medicolegal casework in South Africa.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文

variable importance 关注

1 Predicting Short Time-to-Crime Guns: a Machine Learning Analysis of California Transaction Records (2010-2021).

2 Considerations and targeted approaches to identifying bad actors in exposure mixtures.

3 Accelerated and Interpretable Oblique Random Survival Forests.

4 A novel framework for crash frequency prediction: Geographic support vector regression based on agent-based activity models in Greater Melbourne.

5 Development of American Association for Thoracic Surgery Quality Gateway outcome models, analytics, and visualizations for quality assurance.

6 Linked shrinkage to improve estimation of interaction effects in regression models.

7 Non-linear response of Norway spruce to climate variation along elevational and age gradients in the Carpathians.

8 Insight into the correlation of key taste substances and key volatile substances from shrimp heads at different temperatures.

9 Exploring cranial macromorphoscopic variation and classification accuracy in a South African sample.

10 Evaluating postcranial macromorphoscopic traits to estimate population variation among modern South Africans.