model evaluation

模型评估
  • 文章类型: Journal Article
    在全球变暖的背景下,快速城市化导致城市建筑空间聚集,热岛效应日益严重,阻碍城市可持续发展。为了探讨不同类型社区绿化屋顶缓解城市热岛效应的潜力和方法,这项研究使用了地表温度反演和局部气候区(LCZ)的多元数据,结合LCZ分区和ENVI-met预测模型评价了绿色屋顶降低热岛效应的潜力。最后,进行了具有经济因素的多情景分析,以得出绿色屋顶的最佳实施路径。结果表明,在LCZ1-9中,绿色屋顶可以将LCZ8屋顶的0.5m和LCZ6行人的1.2m的白天平均气温降低最大0.41°C。根据LCZ绿色屋顶的表面冷却效率获得最佳绿色屋顶施工顺序:LCZ3、LCZ6、LCZ8>LCZ2、LCZ5、LCZ7>LCZ1、LCZ4、LCZ9。北京五环内热岛区建设绿色屋顶,可使高温和亚高温区面积减少52.55%和29.17%,分别,与没有绿色屋顶建设的地区相比。研究阐明了不同类型社区绿色屋顶降温效率和降低城市尺度热岛效应的技术方法体系,为城市建筑绿色屋顶的规划提供参考。
    Against the backdrop of global warming, rapid urbanization has caused the aggregation of urban building spaces and the heat island effect is becoming increasingly serious, hindering sustainable urban development. In order to investigate the potential and methods of green roofs in different types of neighborhoods to mitigate the urban heat island effect, this study used multivariate data for surface temperature inversion and local climate zone (LCZ), and the potential of green roofs to reduce the heat island effect was evaluated by combining LCZ zoning and ENVI-met prediction model. Finally, a multi-scenario analysis with economic factors was conducted to derive the optimal implementation path for green roofs. The results show that in LCZs 1-9, the green roof can reduce the daytime average air temperature by a maximum of 0.41 °C for 0.5 m of the LCZ8 roof and 0.37 °C for 1.2 m of the LCZ6 pedestrian. Based on the surface cooling efficiency of LCZ green roofs get the best green roof construction order: LCZ3, LCZ6, LCZ8 > LCZ2, LCZ5, LCZ7 > LCZ1, LCZ4, LCZ9. The construction of green roofs for the heat island areas within the fifth ring road of Beijing can reduce the area of high-temperature and sub-high-temperature zones by 52.55% and 29.17%, respectively, compared with the area without green roof construction. The study clarifies the technical methodology system of cooling efficiency of green roofs in different types of neighborhoods and the reduction of the urban-scale heat island effect, which provides a reference for the planning of green roofs for urban buildings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    塔克-刘易斯指数(TLI;塔克和刘易斯,1973),也称为非规范拟合指数(NNFI;Bentler&Bonett,1980),是线性均值和协方差结构建模中广泛使用的众多增量拟合指数之一,特别是在探索性因素分析中,预防研究中流行的工具。它增加了其他指数提供的信息,例如近似均方根误差(RMSEA)。在本文中,我们开发并检查了使用项目响应理论(IRT)建模的分类项目级别数据的类似索引。提出的IRT的Tucker-Lewis指数(TLIRT)基于Maydeu-Olivares和Joe\'s(2005)[公式:参见正文]有限信息总体模型拟合统计量。在实际情况下,有限信息拟合统计量比传统的全信息Pearson或似然比统计量具有更好的卡方逼近和功效。基于增量拟合评估原则,TLIRT将所考虑的模型的拟合度与最差到最佳可能的模型拟合情况进行比较。我们使用模拟和经验数据来检验新指数的表现。模拟研究的结果表明,新指数的行为符合理论预期,它可以提供其他来源无法提供的有关模型拟合的其他见解。此外,也许需要比Hu和Bentler(1999)传统的连续变量截止标准更严格的截止值。在实证数据分析中,我们使用来自支持戒烟研究的测量开发项目的数据集来说明TLIRT的有用性.我们注意到,如果我们只使用RMSEA指数,我们可以得出关于模型拟合的性质不同的结论,根据测试统计的选择,TLIRT相对更具免疫力的问题。
    The Tucker-Lewis index (TLI; Tucker & Lewis, 1973), also known as the non-normed fit index (NNFI; Bentler & Bonett, 1980), is one of the numerous incremental fit indices widely used in linear mean and covariance structure modeling, particularly in exploratory factor analysis, tools popular in prevention research. It augments information provided by other indices such as the root-mean-square error of approximation (RMSEA). In this paper, we develop and examine an analogous index for categorical item level data modeled with item response theory (IRT). The proposed Tucker-Lewis index for IRT (TLIRT) is based on Maydeu-Olivares and Joe\'s (2005) [Formula: see text] family of limited-information overall model fit statistics. The limited-information fit statistics have significantly better Chi-square approximation and power than traditional full-information Pearson or likelihood ratio statistics under realistic situations. Building on the incremental fit assessment principle, the TLIRT compares the fit of model under consideration along a spectrum of worst to best possible model fit scenarios. We examine the performance of the new index using simulated and empirical data. Results from a simulation study suggest that the new index behaves as theoretically expected, and it can offer additional insights about model fit not available from other sources. In addition, a more stringent cutoff value is perhaps needed than Hu and Bentler\'s (1999) traditional cutoff criterion with continuous variables. In the empirical data analysis, we use a data set from a measurement development project in support of cigarette smoking cessation research to illustrate the usefulness of the TLIRT. We noticed that had we only utilized the RMSEA index, we could have arrived at qualitatively different conclusions about model fit, depending on the choice of test statistics, an issue to which the TLIRT is relatively more immune.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Forecasting models have been influential in shaping decision-making in the COVID-19 pandemic. However, there is concern that their predictions may have been misleading. Here, we dissect the predictions made by four models for the daily COVID-19 death counts between March 25 and June 5 in New York state, as well as the predictions of ICU bed utilisation made by the influential IHME model. We evaluated the accuracy of the point estimates and the accuracy of the uncertainty estimates of the model predictions. First, we compared the \"ground truth\" data sources on daily deaths against which these models were trained. Three different data sources were used by these models, and these had substantial differences in recorded daily death counts. Two additional data sources that we examined also provided different death counts per day. For accuracy of prediction, all models fared very poorly. Only 10.2% of the predictions fell within 10% of their training ground truth, irrespective of distance into the future. For accurate assessment of uncertainty, only one model matched relatively well the nominal 95% coverage, but that model did not start predictions until April 16, thus had no impact on early, major decisions. For ICU bed utilisation, the IHME model was highly inaccurate; the point estimates only started to match ground truth after the pandemic wave had started to wane. We conclude that trustworthy models require trustworthy input data to be trained upon. Moreover, models need to be subjected to prespecified real time performance tests, before their results are provided to policy makers and public health officials.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • DOI:
    文章类型: Journal Article
    Collection and analysis of students\' writing samples on a large scale is a part of the research agenda of the emerging writing analytics community that promises to deliver an unprecedented insight into characteristics of student writing. Yet with a large scale often comes variability of contexts in which the samples were produced-different institutions, different purposes of writing, different author demographics, to name just a few possible dimensions of variation. What are the implications of such variation for the ability of automated methods to create indices/features based on the writing samples that would be valid and meaningful? This paper presents a case study in system generalization. Building on a system developed to assess the expression of utility value (a social-psychology-based construct) in essays written by first-year biology students at one postsecondary institution, we vary data parameters and observe system performance. From the point of view of social psychology, all these variants represent the same underlying construct (i.e., utility value), and it is thus very tempting to think that an automatically produced utility-value score could provide a meaningful analytic, consistently, on a large collection of essays. However, findings from this research show that there are challenges: Some variations are easier to deal with than others, and some components of the automated system generalize better than others. The findings are then discussed both in the context of the case study and more generally.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    先前开发的用于预测浸润性上皮性卵巢癌绝对风险的模型包括有限数量的风险因素,并且具有低辨别能力(受试者工作特征曲线下面积(AUC)<0.60)。正因为如此,我们使用来自美国卵巢癌协会联盟的11项病例对照研究(5,793例;9,512例对照)的数据(1992年至2010年的数据),开发并内部验证了一个包含17个已确定的流行病学危险因素和17个全基因组显著单核苷酸多态性(SNP)的相对风险预测模型.我们开发了一种分层逻辑回归模型,用于预测病例控制状态,其中包括缺失数据的填补。我们将数据随机分成80%的训练样本,剩下的20%用于模型评估。完整模型的AUC为0.664。没有SNP的简化模型类似地进行(AUC=0.649)。两种模型的表现均优于仅包括年龄和研究地点的基线模型(AUC=0.563)。在50岁以下的女性中,在全模型中获得最佳预测能力(AUC=0.714);然而,SNP的添加增加了50岁以上女性的AUC(AUC=0.638vs.0.616)。调整此改进模型以估计绝对风险并在前瞻性数据集中进行评估是必要的。
    Previously developed models for predicting absolute risk of invasive epithelial ovarian cancer have included a limited number of risk factors and have had low discriminatory power (area under the receiver operating characteristic curve (AUC) < 0.60). Because of this, we developed and internally validated a relative risk prediction model that incorporates 17 established epidemiologic risk factors and 17 genome-wide significant single nucleotide polymorphisms (SNPs) using data from 11 case-control studies in the United States (5,793 cases; 9,512 controls) from the Ovarian Cancer Association Consortium (data accrued from 1992 to 2010). We developed a hierarchical logistic regression model for predicting case-control status that included imputation of missing data. We randomly divided the data into an 80% training sample and used the remaining 20% for model evaluation. The AUC for the full model was 0.664. A reduced model without SNPs performed similarly (AUC = 0.649). Both models performed better than a baseline model that included age and study site only (AUC = 0.563). The best predictive power was obtained in the full model among women younger than 50 years of age (AUC = 0.714); however, the addition of SNPs increased the AUC the most for women older than 50 years of age (AUC = 0.638 vs. 0.616). Adapting this improved model to estimate absolute risk and evaluating it in prospective data sets is warranted.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号