regression tree

回归树
  • 文章类型: Journal Article
    人为影响显着改变了整个亚洲河流生态系统的水化学特性和物质流,可能占全球排放量的40-50%。尽管亚洲河流受到普遍影响,缺乏调查它们与二氧化碳(CO2)排放的相关性的研究。在这项研究中,我们使用基于碳酸盐平衡的模型(pCO2SYS)计算了CO2分压(pCO2),并根据2013-2021年恒河91个站点的历史记录检查了其与水化学参数的相关性。调查揭示了整个恒河中pCO2的巨大空间异质性。pCO2浓度从1321.76μatm变化,1130.98μatm,上部为1174.33μatm,中间,和较低的拉伸,分别,平均值为1185.29μatm。有趣的是,与中段和下段相比,上段拉伸表现出升高的平均pCO2和FCO2水平(CO2逸度:3.63gm2d-1),强调水化学和CO2动力学之间复杂的相互作用。在二氧化碳波动的背景下,上段的硝酸盐浓度以及中段和下段的生物需氧量(BOD)和溶解氧(DO)水平正在成为关键的解释因素。此外,回归树(RT)和重要性分析指出生化需氧量(BOD)是影响恒河pCO2变化的最重要因素(n=91)。还观察到BOD和FCO2之间存在强烈的负相关。这两个参数的不同纵向模式可能会导致BOD和pCO2之间的负相关。因此,有必要进行全面的研究,以破译管理这种关系的潜在机制。目前的见解有助于理解恒河中二氧化碳排放的潜力,并促进河流的恢复和管理。我们的发现强调了将南亚河流纳入全球碳预算评估的重要性。
    Anthropogenic influences significantly modify the hydrochemical properties and material flow in riverine ecosystems across Asia, potentially accounting for 40-50% of global emissions. Despite the pervasive impact on Asian rivers, there is a paucity of studies investigating their correlation with carbon dioxide (CO2) emissions. In this study, we computed the partial pressure of CO2 (pCO2) using the carbonate equilibria-based model (pCO2SYS) and examined its correlation with hydrochemical parameters from historical records at 91 stations spanning 2013-2021 in the Ganga River. The investigation unveiled substantial spatial heterogeneity in the pCO2 across the Ganga River. The pCO2 concentration varied from 1321.76 μatm, 1130.98 μatm, and 1174.33 μatm in the upper, middle, and lower stretch, respectively, with a mean of 1185.29 μatm. Interestingly, the upper stretch exhibited elevated mean pCO2 and FCO2 levels (fugacity of CO2: 3.63 gm2d-1) compared to the middle and lower stretch, underscoring the intricate interplay between hydrochemistry and CO2 dynamics. In the context of pCO2 fluctuations, nitrate concentrations in the upper segment and levels of biological oxygen demand (BOD) and dissolved oxygen (DO) in the middle and lower segments are emerging as crucial explanatory factors. Furthermore, regression tree (RT) and importance analyses pinpointed biochemical oxygen demand (BOD) as the paramount factor influencing pCO2 variations across the Ganga River (n = 91). A robust negative correlation between BOD and FCO2 was also observed. The distinct longitudinal patterns of both parameters may induce a negative correlation between BOD and pCO2. Therefore, comprehensive studies are necessitated to decipher the underlying mechanisms governing this relationship. The present insights are instrumental in comprehending the potential of CO2 emissions in the Ganga River and facilitating riverine restoration and management. Our findings underscore the significance of incorporating South Asian rivers in the evaluation of the global carbon budget.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    COVID-19席卷全球,人们注意到过境乘客量的空前下降。然而,很少关注交通系统的弹性,特别是在中等城市。利用2017年至2021年盐湖县的轻轨乘客数据集,我们开发了一种新颖的方法,使用贝叶斯结构时间序列模型来测量公交乘客的脆弱性和弹性。结果表明,与COVID-19病例数相比,政府政策对过境乘客人数的影响更大。关于建筑环境,高度紧凑的城市设计可能会降低建筑物的覆盖率,并使公交车站更加脆弱和弹性降低。此外,少数民族的高比率是过境乘客减少的主要原因。这些发现对于了解过境乘客对流行病的脆弱性和复原力,以便在未来采取更好的应对策略是有价值的。
    COVID-19 has swept the world, and the unprecedented decline in transit ridership has been noticed. However, little attention has been paid to the resilience of the transportation system, particularly in medium-sized cities. Drawing upon a light rail ridership dataset in Salt Lake County from 2017 to 2021, we develop a novel method to measure the vulnerability and resilience of transit ridership using a Bayesian structure time series model. The results show that government policies have a more significant impact than the number of COVID-19 cases on transit ridership. Regarding the built environment, a highly compact urban design might reduce the building coverage ratio and makes transit stations more vulnerable and less resilient. Furthermore, the high rate of minorities is the primary reason for the drops in transit ridership. The findings are valuable for understanding the vulnerability and resilience of transit ridership to pandemics for better coping strategies in the future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    将NSCLC准确分型为肺腺癌(LUAD)和肺鳞癌(LUSC)是NSCLC诊断的基石。细胞学样本显示出更高的分类失败率,也就是说,亚型为非小细胞癌-未指明(NSCC-NOS),与组织学标本相比。这项研究旨在根据已知的细胞形态学特征确定特定的算法,以帮助在细胞学上准确和成功地对NSCLC进行分型。
    共有13位专家细胞病理学家匿名参与了一项针对非角化性LUSC的119个亚型NSCLC细胞学病例的在线调查(80个为LUAD,39个为LUSC)。他们从23个预定义的细胞形态学特征中进行了选择,这些特征用于分型。在随机森林方法和回归树的基础上,使用机器学习算法对数据进行分析。
    从记录的1474个响应中,53.7%(1474例中的792例)的反应达到了一致的细胞学分型.在金标准LUAD(36%)和LUSC(38%)病例中,细胞学上的NSCC-NOS率相似。金标准LUSC(17.6%)的误分类率高于金标准LUAD(5.5%;p<0.0001)。角化,当存在时,以高精度识别LUSC。在缺席的情况下,在专家选择的基础上开发的机器学习算法无法在不增加误分类率的情况下降低细胞学NSCC-NOS率.
    在没有角质化的情况下,对LUSC的次优识别仍然是提高细胞学分型准确性的主要障碍,这种情况要么分类失败(NSCC-NOS),要么错误分类为LUAD。NSCC-NOS似乎是不可避免的形态学诊断,强调辅助免疫化学对于实现细胞学的准确分型是必要的。
    Accurate subtyping of NSCLC into lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) is the cornerstone of NSCLC diagnosis. Cytology samples reveal higher rates of classification failures, that is, subtyping as non-small cell carcinoma-not otherwise specified (NSCC-NOS), as compared with histology specimens. This study aims to identify specific algorithms on the basis of known cytomorphologic features that aid accurate and successful subtyping of NSCLC on cytology.
    A total of 13 expert cytopathologists participated anonymously in an online survey to subtype 119 NSCLC cytology cases (gold standard diagnoses being LUAD in 80 and LUSC in 39) enriched for nonkeratinizing LUSC. They selected from 23 predefined cytomorphologic features that they used in subtyping. Data were analyzed using machine learning algorithms on the basis of random forest method and regression trees.
    From 1474 responses recorded, concordant cytology typing was achieved in 53.7% (792 of 1474) responses. NSCC-NOS rates on cytology were similar among gold standard LUAD (36%) and LUSC (38%) cases. Misclassification rates were higher in gold standard LUSC (17.6%) than gold standard LUAD (5.5%; p < 0.0001). Keratinization, when present, recognized LUSC with high accuracy. In its absence, the machine learning algorithms developed on the basis of experts\' choices were unable to reduce cytology NSCC-NOS rates without increasing misclassification rates.
    Suboptimal recognition of LUSC in the absence of keratinization remains the major hurdle in improving cytology subtyping accuracy with such cases either failing classification (NSCC-NOS) or misclassifying as LUAD. NSCC-NOS seems to be an inevitable morphologic diagnosis emphasizing that ancillary immunochemistry is necessary to achieve accurate subtyping on cytology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    To operationalize an intersectionality framework using a novel statistical approach and with these efforts, improve the estimation of disparities in access (i.e., wait time to treatment entry) to opioid use disorder (OUD) treatment beyond race.
    Sample of 941,286 treatment episodes collected in 2015-2017 in the United States from the Treatment Episodes Data Survey (TEDS-A) and a subset from California (n = 188,637) and Maryland (n = 184,276), states with the largest sample of episodes.
    This retrospective subgroup analysis used a two-step approach called virtual twins. In Step 1, we trained a classification model that gives the probability of waiting (1 day or more). In Step 2, we identified subgroups with a higher probability of differences due to race. We tested three classification models for Step 1 and identified the model with the best estimation.
    Client data were collected by states during personal interviews at admission and discharge.
    Random forest was the most accurate model for the first step of subgroup analysis. We found large variation across states in racial disparities. Stratified analysis of two states with the largest samples showed critical factors that augmented disparities beyond race. In California, factors such as service setting, referral source, and homelessness defined the subgroup most vulnerable to racial disparities. In Maryland, service setting, prior episodes, receipt of medication-assisted opioid treatment, and primary drug use frequency augmented disparities beyond race. The identified subgroups had significantly larger racial disparities.
    The methodology used in this study enabled a nuanced understanding of the complexities in disparities research. We found state and service factors that intersected with race and augmented disparities in wait time. Findings can help decision makers target modifiable factors that make subgroups vulnerable to waiting longer to enter treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    植物和微生物产生的酸性磷酸酶在土壤磷(P)的回收中起着基本作用。在大的空间尺度上量化潜在的酸性磷酸酶活性(AP)的空间变化及其驱动因素可以帮助减少我们对土壤P的生物有效性的理解的不确定性。我们应用了两种机器学习方法(随机森林和反向传播人工网络)通过扩大实验室测量的田间样品对潜在AP活性的126个现场观测来模拟整个欧洲AP的空间格局,使用12个环境驱动因素作为预测因素。反向传播人工网络(BPN)方法解释了58%的AP变异性,超过回归树模型(49%)。此外,BPN能够沿着欧洲的三个样点识别AP中的梯度。偏相关分析表明,土壤养分(全氮,总P,和不稳定的有机磷)和气候控制(年降水量,年平均气温,和温度振幅)是影响AP空间变化的主要因素。较高的AP发生在年平均温度较高的地区,降水和较高的土壤全氮。土壤TP和Po与欧洲的模拟AP非单调相关,表明干旱和潮湿地区生物群落利用磷的不同策略。这项研究有助于分离每个因素对AP产量的影响,并减少估算土壤P有效性的不确定性。用欧洲数据训练的BPN模型,然而,由于缺乏热带地区的代表性AP测量,因此无法生成强大的AP全球地图。填补这一数据空白将有助于我们了解自然土壤中磷利用策略的生理基础。
    Acid phosphatase produced by plants and microbes plays a fundamental role in the recycling of soil phosphorus (P). A quantification of the spatial variation in potential acid phosphatase activity (AP) on large spatial scales and its drivers can help to reduce the uncertainty in our understanding of bio-availability of soil P. We applied two machine-learning methods (Random forests and back-propagation artificial networks) to simulate the spatial patterns of AP across Europe by scaling up 126 site observations of potential AP activity from field samples measured in the laboratory, using 12 environmental drivers as predictors. The back-propagation artificial network (BPN) method explained 58% of AP variability, more than the regression tree model (49%). In addition, BPN was able to identify the gradients in AP along three transects in Europe. Partial correlation analysis revealed that soil nutrients (total nitrogen, total P, and labile organic P) and climatic controls (annual precipitation, mean annual temperature, and temperature amplitude) were the dominant factors influencing AP variations in space. Higher AP occurred in regions with higher mean annual temperature, precipitation and higher soil total nitrogen. Soil TP and Po were non-monotonically correlated with modeled AP for Europe, indicating diffident strategies of P utilization by biomes in arid and humid area. This study helps to separate the influences of each factor on AP production and to reduce the uncertainty in estimating soil P availability. The BPN model trained with European data, however, could not produce a robust global map of AP due to the lack of representative measurements of AP for tropical regions. Filling this data gap will help us to understand the physiological basis of P-use strategies in natural soils.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    River networks in subtropical agricultural hilly region become an inconvenient greenhouse gas (GHG, methane and nitrous oxide) source because of the influence of human activities, which has caused large uncertainties for refinement of national GHG inventories and their global budget. Based on field monitoring experiments at high temporal resolution, we employed regression tree and importance analysis to identify quantitatively factors that influence the diffusive flux of GHGs to provide a scientific basis for reducing GHG emissions and controlling regional carbon and nitrogen losses. The results indicate that significant spatiotemporal variation of methane (CH4) nitrous oxide (N2O) diffusion occurs in all the four reaches (W1, W2, W3 and W4) of Tuojia river networks. Among them, W1 contributed lowest CH4 (22.55 μg C m-2 h-1) and N2O (5.00 μg N m-2 h-1) diffusive flux than the other three (P < 0.05), while W4 offered highest CH4 (166.15 μg C m-2 h-1) and N2O (30.47 μg N m-2 h-1) diffusive flux but with no statistically significant difference between W2 and W3 due to homogeneous extraneous nutrition loading into the two reaches. W4 also contributed largest cumulative flux of CH4 (14.55 kg C ha-1 yr-1) and N2O (2.69 kg N ha-1 yr-1) in Tuojia River networks (P < 0.05). Furthermore, the regression tree and importance analysis indicate that, in the anaerobic environment, dissolved oxygen saturation controlled the production and diffusion for both CH4 and N2O. The findings of this investigation highlighted that decision support tools provide an effective pathway to enhance the GHG mitigation technology research in agroecosystems and simultaneously shed light on the global campaign on refinement of national GHG inventories as well as regional nutrient management.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: English Abstract
    Increasingly frequent urban waterlogging disasters, which are mainly caused by the increase in impervious surfaces due to rapid urbanization, have attracted public attention. Green roofs are conducive to increasing the urban pervious surface area to control sources of runoff, which has great significance for the ecological environment. This study uses the green roof of the administrative building of Jinling Primary School in Nanjing as the study area. 76 rainfall-runoff events collected over 17 months (2016-06-2017-10) were used to calculate the comprehensive runoff control ability and factors influencing the green roof in the context of the site scale. Based on life cycle assessment theory, the benefits of stormwater regulation over its 30-year life cycle were quantitatively evaluated. The results show that:① The average retention of the green roof was 62.7%, which could have a significant impact on runoff and peak flow, reducing the runoff time and delaying the flood peak. ② The green roof has a strong ability to retain runoff during small and medium rainfall; however, this ability becomes low when the retention capacity is saturated or not fully recovered, even in small rainfall-runoff events. ③ The main factors affecting the retention ability of the green roof are the total rainfall, rainfall intensity, and water content of the growth substrate soil. ④ The green roof has great economic benefits, with a construction cost of about 12.51 yuan·m-3 and a return on investment of 0.41. The results of this study can provide an important scientific basis and decision-making reference for the planning and construction of green roofs and the promotion of related policies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Editorial
    随机对照试验(RCT)通常招募异质性研究人群,因此,确定治疗可能有益或有害的患者亚组是有趣的。已经开发了多种方法来进行这种事后分析。传统的广义线性模型能够包括作为主要效应的预后变量和与治疗变量相互作用的预测变量。统计学上显著且大的相互作用效应通常表明可能对治疗有不同反应的潜在亚组。然而,传统的回归方法需要指定交互项,这需要预测变量的知识,或者当存在大量特征变量时变得不可行。最小绝对收缩和选择算子(LASSO)方法通过将不太清楚的效应(包括交互效应)收缩到零而进行变量选择,并且以这种方式仅选择模型的某些变量和交互。有许多基于树的子组识别方法。例如,基于模型的递归划分将参数模型(如广义线性模型)合并到树中。合并的模型通常是一个简单的模型,只有作为协变量的处理。通过树自动找到并合并预测变量和预后变量。本文概述了这些方法,并解释了如何使用用于统计计算R的自由软件环境(3.3.2版)来执行这些方法。使用模拟数据集来说明这些方法的性能。
    Randomized controlled trials (RCTs) usually enroll heterogeneous study population, and thus it is interesting to identify subgroups of patients for whom the treatment may be beneficial or harmful. A variety of methods have been developed to do such kind of post hoc analyses. Conventional generalized linear model is able to include prognostic variables as a main effect and predictive variables in an interaction with treatment variable. A statistically significant and large interaction effect usually indicates potential subgroups that may have different responses to the treatment. However, the conventional regression method requires to specify the interaction term, which requires knowledge of predictive variables or becomes infeasible when there is a large number of feature variables. The Least Absolute Shrinkage and Selection Operator (LASSO) method does variable selection by shrinking less clear effects (including interaction effects) to zero and in this way selects only certain variables and interactions for the model. There are many tree-based methods for subgroup identification. For example, model-based recursive partitioning incorporates parametric models such as generalized linear models into trees. The model incorporated is usually a simple model with only the treatment as covariate. Predictive and prognostic variables are found and incorporated automatically via the tree. The present article gives an overview of these methods and explains how to perform them using the free software environment for statistical computing R (version 3.3.2). A simulated dataset is employed for illustrating the performance of these methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Accumulations of heavy metals in urban soils are highly spatial heterogeneity and affected by multiple factors including soil properties, land use and pattern, population and climatic conditions. We studied accumulation risks of Cd, Cu, Pb and Zn in unban soils of Beijing and their influencing based on the regression tree analysis and a GIS-based overlay model. Result shows that Zinc causes the most extensive soil pollution and Cu result in the most acute soil pollution. The soil\'s organic carbon content and CEC and population growth are the most significant factors affecting heavy metal accumulation. Other influence factors in land use pattern, urban landscape, and wind speed also contributed, but less pronounced. The soils in areas with higher degree of urbanization and surrounded by intense vehicular traffics have higher accumulation risk of Cd, Cu, Pb, and Zn.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号