Models, Statistical

模型, 统计
  • 文章类型: Journal Article
    目的:胰腺癌在其他癌症中具有较高的患病率和死亡率。尽管这种癌症的存活率很低,早期预测本病对降低病死率和改善预后具有重要作用。所以,这项研究。
    方法:在这项回顾性研究中,我们使用654例活着和死亡的PC病例建立了PC的预测模型。选择的六个机器学习算法和预后因素被用来建立预测模型。使用高性能算法的相对重要性评估预测因素的重要性。
    结果:在内部和外部验证模式下,AU-ROC为0.933(95%CI=[0.906-0.958])和AU-ROC为0.836(95%CI=[0.789-0.865]的XG-Boost被认为是预测PC死亡风险的最佳模型。因素,包括肿瘤大小,吸烟,和化疗,被认为是对预测最有影响力的。
    结论:XG-Boost在预测PC患者的死亡风险方面获得了更高的性能效率,因此,这种模式可以促进临床解决方案,医生可以在医疗保健环境中实现,以降低这些患者的死亡风险。
    OBJECTIVE: Pancreatic cancer possesses a high prevalence and mortality rate among other cancers. Despite the low survival rate of this cancer type, the early prediction of this disease has a crucial role in decreasing the mortality rate and improving the prognosis. So, this study.
    METHODS: In this retrospective study, we used 654 alive and dead PC cases to establish the prediction model for PC. The six chosen machine learning algorithms and prognostic factors were utilized to build the prediction models. The importance of the predictive factors was assessed using the relative importance of a high-performing algorithm.
    RESULTS: The XG-Boost with AU-ROC of 0.933 (95% CI= [0.906-0.958]) and AU-ROC of 0.836 (95% CI= [0.789-0.865] in internal and external validation modes were considered as the best-performing model for predicting the mortality risk of PC. The factors, including tumor size, smoking, and chemotherapy, were considered the most influential for prediction.
    CONCLUSIONS: The XG-Boost gained more performance efficiency in predicting the mortality risk of PC patients, so this model can promote the clinical solutions that doctors can achieve in healthcare environments to decrease the mortality risk of these patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    具有空间起点-目的地(OD)过滤器的空间相互作用模型是表征空间中旅行流的强大工具,这是区域科学中一个经典而重要的问题。就作者所知,采用OD滤波器的现有研究大多将空间依赖性指定为自回归过程,这可能不是空间效果的全貌。为了检查问题,本文提出了以下假设:1)空间OD依赖性可以在空间相互作用模型中的空间自回归项和空间误差项中发生。2)使用OD滤波器估计具有空间自回归干扰(SARAR)模型的空间自回归模型将解开存在空间依赖性的位置以及多少。3)从统计角度来看,当SARAR模型优于空间自回归(SAR)模型和空间误差模型(SEM)时,从SARAR模型获得的边际效应将是分析师的首选。为了评估这些假设,本文规定,估计,并应用SARAR模型和OD过滤器来研究跳闸分布。通过与替代模型的比较,本文研究了SAR中的估计结果,SEM和SARAR模型使用从杭州收集的经验数据,中国。本文的贡献是第一个开发带有OD滤波器的SARAR模型,用于跳闸分布分析并检查其性能。
    Spatial interaction models with spatial origin-destination (OD) filters are powerful tools to characterize trip flows in space, which is a classic and important problem in regional science. To the authors\' knowledge, existing studies adopting OD filters mostly specify the spatial dependence as an autoregressive process, which may not be the full picture of spatial effects. To examine the problem, this paper proposes the hypotheses that 1) spatial OD dependences can take place in both the spatial autoregressive term and the spatial error term in a spatial interaction model. 2) Estimating a spatial autoregressive model with spatial autoregressive disturbances (SARAR) model with OD filters would disentangle where the spatial dependence exists and by how much. 3) The marginal effects obtained from SARAR models would be preferred to analysts when SARAR models outperform spatial autoregressive (SAR) models and spatial error models (SEM) from the statistical point of view. To assess these hypotheses, this paper specifies, estimates, and applies SARAR models with OD filters to investigate trip distributions. By comparing against alternative models, this paper investigates the estimation results in SAR, SEM and SARAR models using an empirical data collected from Hangzhou, China. The contribution of this paper is to be the first in developing an SARAR model with OD filters for trip distribution analyses and examining its performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    几乎在每个国家,专利被授予后需要多次更新。专利权人评估专利的价值,然后支付续展费,使其在另一个规定的期限内保持有效。表征专利价值的因素是主观的。本文旨在解决建立准确模型来预测印度专利的更新寿命(通常被认为是专利价值的替代品)的研究空白,并确定影响更新寿命的重要因素。这项研究使用了从印度专利局收集的广泛数据集,用于1995年至2005年之间提交的所有授权专利。流行的统计和机器学习算法不能产生准确的预测模型,因为专利更新寿命分布(至少对于印度专利)在两个极端值处显示出不寻常的峰值,这使得建模任务更具挑战性。我们通过结合有效的多类分类器和二项回归模型来预测复杂的更新数据分布,提出了一种新的两阶段混合模型。我们将所提出的模型与几种最先进的机器学习和统计模型进行了比较分析。结果表明,与最佳竞争对手仅提供40%的准确性相比,所提出的混合模型提供了90%的准确性。
    In almost every country, patents need to be renewed multiple times after they are granted. A patentee assesses the value of the patent and then pays a renewal fee to keep it active for another stipulated period. The factors that characterize the value of a patent is subjective. This paper aims to address the research gap of building an accurate model for predicting the renewal life (often considered as a substitute for the patent value) of Indian patents, and identification of significant factors that influence the renewal life. This study uses an extensive data set collected from the Indian Patent Office for all granted patents filed between 1995 and 2005. The popular statistical and machine learning algorithms do not result in accurate predictive models, because the patent renewal life distribution (at least for the Indian patents) shows unusual spikes at the two extreme values, which makes the modeling task more challenging. We propose a new two-stage hybrid model by combining an efficient multi-class classifier and a binomial regression model for predicting the complex renewal data distribution. We conducted a comparative analysis of the proposed model with several state-of-the-art machine learning and statistical models. The results show that the proposed hybrid model gives 90% accuracy as compared to the best competitor which gives only 40% accuracy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    使用最大似然估计(MLE)拟合的风险预测模型通常过度拟合,导致预测过于极端,校准斜率(CS)小于1。惩罚方法,比如里奇和套索,已经被建议作为这个问题的解决方案,因为它们倾向于将回归系数缩小到零,导致预测更接近平均值。收缩量由调谐参数调节,λ,$\\lambda,$通常通过交叉验证(“标准调整”)选择。尽管已经发现惩罚方法可以平均改善校准,它们经常过度收缩,并在选定的λ$\\lambda$和CS中表现出很大的可变性。这是个问题,特别是对于小样本量,而且在使用样本量时也建议控制过拟合。我们考虑这些问题是否部分是由于使用交叉验证选择λ$\\lambda$,与原始开发样本相比,“训练”数据集的大小减小,导致λ$\\lambda$的高估,因此,过度收缩。我们提出了一种改进的交叉验证调优方法(“改进的调优”),从通过从原始数据集引导获得的伪开发数据集估计λ$\\lambda$,尽管尺寸较大,这样得到的交叉验证训练数据集的大小与原始数据集相同。修改的调谐可以在标准软件中容易地实现,并且与调谐参数的引导选择(“引导调谐”)密切相关。我们使用推荐的样本量在模拟和真实数据中评估了Ridge和Lasso的修改和引导调整,和尺寸略低和高。他们大大改进了λ$\\lambda$的选择,与标准调谐方法相比,CS得到了改进。与MLE相比,他们还改进了预测。
    Risk prediction models fitted using maximum likelihood estimation (MLE) are often overfitted resulting in predictions that are too extreme and a calibration slope (CS) less than 1. Penalized methods, such as Ridge and Lasso, have been suggested as a solution to this problem as they tend to shrink regression coefficients toward zero, resulting in predictions closer to the average. The amount of shrinkage is regulated by a tuning parameter, λ , $\\lambda ,$ commonly selected via cross-validation (\"standard tuning\"). Though penalized methods have been found to improve calibration on average, they often over-shrink and exhibit large variability in the selected λ $\\lambda $ and hence the CS. This is a problem, particularly for small sample sizes, but also when using sample sizes recommended to control overfitting. We consider whether these problems are partly due to selecting λ $\\lambda $ using cross-validation with \"training\" datasets of reduced size compared to the original development sample, resulting in an over-estimation of λ $\\lambda $ and, hence, excessive shrinkage. We propose a modified cross-validation tuning method (\"modified tuning\"), which estimates λ $\\lambda $ from a pseudo-development dataset obtained via bootstrapping from the original dataset, albeit of larger size, such that the resulting cross-validation training datasets are of the same size as the original dataset. Modified tuning can be easily implemented in standard software and is closely related to bootstrap selection of the tuning parameter (\"bootstrap tuning\"). We evaluated modified and bootstrap tuning for Ridge and Lasso in simulated and real data using recommended sample sizes, and sizes slightly lower and higher. They substantially improved the selection of λ $\\lambda $ , resulting in improved CS compared to the standard tuning method. They also improved predictions compared to MLE.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    经常使用观察性研究来估计暴露或治疗对结果的影响。为了获得对治疗效果的无偏估计,准确测量暴露是至关重要的。一种常见的暴露错误分类是召回偏差,这发生在回顾性队列研究中,当研究对象可能不准确地回忆他们过去的暴露。特别具有挑战性的是,在自我报告的二元曝光的背景下,差异召回偏差,其中偏差可能是方向性的,而不是随机的,其程度根据所经历的结果而变化。本文做出了一些贡献:(1)即使没有验证研究,它也为平均治疗效果建立了界限;(2)它提出了基于不同假设的各种策略的多种估计方法;(3)它提出了一种敏感性分析技术来评估因果结论的稳健性,结合了先前研究的见解。通过探索各种模型错误指定场景的仿真研究,证明了这些方法的有效性。然后将这些方法用于研究儿童期身体虐待对成年后心理健康的影响。
    Observational studies are frequently used to estimate the effect of an exposure or treatment on an outcome. To obtain an unbiased estimate of the treatment effect, it is crucial to measure the exposure accurately. A common type of exposure misclassification is recall bias, which occurs in retrospective cohort studies when study subjects may inaccurately recall their past exposure. Particularly challenging is differential recall bias in the context of self-reported binary exposures, where the bias may be directional rather than random and its extent varies according to the outcomes experienced. This paper makes several contributions: (1) it establishes bounds for the average treatment effect even when a validation study is not available; (2) it proposes multiple estimation methods across various strategies predicated on different assumptions; and (3) it suggests a sensitivity analysis technique to assess the robustness of the causal conclusion, incorporating insights from prior research. The effectiveness of these methods is demonstrated through simulation studies that explore various model misspecification scenarios. These approaches are then applied to investigate the effect of childhood physical abuse on mental health in adulthood.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    体育记录在理解人类体育成就的极限中起着至关重要的作用。然而,缺乏利用现有统计模型对各种体育记录进行全面分析的彻底探索。本研究引入了一个框架,用于分析男女23项运动记录的综合特征和演变趋势。它包括来自六大洲的世界纪录和洲际纪录,从2001年到2020年,涵盖6440名运动员。我们的发现表明,人类在运动表现方面尚未达到运动极限,建议随着时间的推移不断改进。此外,我们已经研究了模型参数对集成特征的贡献,强调它们在处理数据流和信息熵时的鲁棒性和收敛性。此外,我们的模型强调了整合各种运动对于持续发展的重要性,符合奥林匹克格言“在一起,“从而促进协调发展。
    Sports records play a crucial role in understanding the limits of human achievement in sports. However, a thorough exploration of a comprehensive analysis of various sports records utilizing the existing statistical model has been lacking. This study introduces a framework for analyzing the integrated features and evolutionary trends of 23 sports records for men and women. It includes world records and intercontinental records from six continents, covering 6440 athletes from 2001 to 2020. Our findings indicate that human beings have not yet reached sports limits in athletic performance, suggesting a continuous improvement over time. Furthermore, we have investigated the contributions of our model\'s parameters to the integrated features, emphasizing their robustness and convergence in handling data flow and information entropy. Additionally, our model underscores the significance of integrating various sports for ongoing advancement, in line with the Olympic motto \"Together,\" thereby promoting coordinated development.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    利用潜在狄利克雷分配(LDA)模型提取报纸新闻的文本主题,构建中国经济政策不确定性(EPU)指数。在此基础上,基于中国A股上市公司2008-2020年的相关数据,实证分析了EPU对企业R&D投入同行效应的影响,并发现EPU会加剧企业研发投入的同行效应。此外,检验了管理者维护声誉的动机对EPU影响企业R&D投资同伴效应过程的调节作用,验证了EPU通过金融摩擦影响企业研发投入的同行效应的机制。
    The Latent Dirichlet Allocation (LDA) model is used to extract the text themes of newspaper news and construct the Chinese Economic Policy Uncertainty (EPU) Index. On this basis, based on the relevant data of Chinese A-share listed companies from 2008 to 2020, this paper empirically analyzes the impact of EPU on peer effects of firms R&D investment, and finds that EPU will aggravate the peer effects of firms R&D investment. Furthermore, the moderating effect of manager\'s motivation to maintain reputation on the process of EPU influencing the peer effects of firms R&D investment was tested, and the mechanism of EPU influencing the peer effects of firms R&D investment through financial frictions was verified.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:HALE现在是中国政府各级政府的常规战略规划指标。然而,HALE测量需要全面的数据收集和复杂的技术。因此,有效地将多种疾病转化为残疾年(YLD)率是HALE测量的重大挑战。我们的研究旨在基于中国境内实际数据资源的现状,构建一个简单的具有高适用性的YLD率测量模型,以解决在规划过程中测量HALE目标值的挑战。
    方法:首先,基于2019年全球疾病负担(GBD)中中国人的YLD率,皮尔逊相关分析,全局最优方法,等。,用于从当前的中文数据资源中筛选最佳预测变量。预测变量的缺失数据通过样条插值填充。然后,多元线性回归模型构建YLD率测量模型。Sullivan方法用于测量HALE。蒙特卡罗方法用于产生95%的不确定度区间。最后,使用平均绝对误差(MAE)和平均绝对百分比误差(MAPE)评估模型性能.
    结果:构建了一个三输入参数模型来衡量中国按性别划分的年龄YLD率,直接利用传染病的发病率,15岁及以上人群的慢性病发病率,以及增加5岁以下儿童死亡率协变量。合并YLD率的总MAE和MAPE分别为0.0007和0.5949%,分别。0岁组合并HALE的MAE和MAPE分别为0.0341和0.0526%,分别。男性(0.0197,0.0311%)略低于女性(0.0501,0.0755%)。
    结论:我们使用中国国民常规的三个监测指标作为预测变量,构建了一个高精度模型来测量中国的YLD率。该模型为在国家尤其是区域层面测量HALE提供了现实可行的解决方案,考虑到有限的数据。
    BACKGROUND: HALE is now a regular strategic planning indicator for all levels of the Chinese government. However, HALE measurements necessitate comprehensive data collection and intricate technology. Therefore, effectively converting numerous diseases into the years lived with disability (YLD) rate is a significant challenge for HALE measurements. Our study aimed to construct a simple YLD rate measurement model with high applicability based on the current situation of actual data resources within China to address challenges in measuring HALE target values during planning.
    METHODS: First, based on the Chinese YLD rate in the Global Burden of Disease (GBD) 2019, Pearson correlation analysis, the global optimum method, etc., was utilized to screen the best predictor variables from the current Chinese data resources. Missing data for predictor variables were filled in via spline interpolation. Then, multiple linear regression models were fitted to construct the YLD rate measurement model. The Sullivan method was used to measure HALE. The Monte Carlo method was employed to generate 95% uncertainty intervals. Finally, model performances were assessed using the mean absolute error (MAE) and mean absolute percentage error (MAPE).
    RESULTS: A three-input-parameter model was constructed to measure the age-specific YLD rates by sex in China, directly using the incidence of infectious diseases, the incidence of chronic diseases among persons aged 15 and older, and the addition of an under-five mortality rate covariate. The total MAE and MAPE for the combined YLD rate were 0.0007 and 0.5949%, respectively. The MAE and MAPE of the combined HALE in the 0-year-old group were 0.0341 and 0.0526%, respectively. There were slightly fewer males (0.0197, 0.0311%) than females (0.0501, 0.0755%).
    CONCLUSIONS: We constructed a high-accuracy model to measure the YLD rate in China by using three monitoring indicators from the Chinese national routine as predictor variables. The model provides a realistic and feasible solution for measuring HALE at the national and especially regional levels, considering limited data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    众所周知的连续分布,如Beta和Kumaraswamy分布对于基于单位间隔[0,1]的数据集建模是有用的。但是每个分布并不总是对所有类型的数据集有用,相反,它也取决于数据的形状。在这项研究中,定义了一个名为有界指数Weibull(BEW)分布的三参数新分布,以在单位区间[0,1]的支持下对数据集进行建模。已经研究了BEW分布的一些基本分布性质。对于数据集中度量之间的相关性建模,开发了BEW分布的双变量扩展,并显示了二元BEW分布的图形形状。已经讨论了几种估计方法来估计BEW分布的参数并检查估计器的性能,进行了蒙特卡罗模拟研究。之后,BEW分布的应用使用COVID-19数据集进行了说明。所提出的分布显示出比许多众所周知的分布更好的拟合。最后,建立了有界指数Weibull分布的分位数回归模型,并显示了概率密度函数(PDF)和危险函数的图形形状。
    Well-known continuous distributions such as Beta and Kumaraswamy distribution are useful for modeling the datasets which are based on unit interval [0,1]. But every distribution is not always useful for all types of data sets, rather it depends on the shapes of data as well. In this research, a three-parameter new distribution named bounded exponentiated Weibull (BEW) distribution is defined to model the data set with the support of unit interval [0,1]. Some fundamental distributional properties for the BEW distribution have been investigated. For modeling dependence between measures in a dataset, a bivariate extension of the BEW distribution is developed, and graphical shapes for the bivariate BEW distribution have been shown. Several estimation methods have been discussed to estimate the parameters of the BEW distribution and to check the performance of the estimator, a Monte Carlo simulation study has been done. Afterward, the applications of the BEW distribution are illustrated using COVID-19 data sets. The proposed distribution shows a better fit than many well-known distributions. Lastly, a quantile regression model from bounded exponentiated Weibull distribution is developed, and its graphical shapes for the probability density function (PDF) and hazard function have been shown.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Haseman-Elston回归(HE-reg)被认为是检测加性遗传变异分量的经典工具。然而,在这项研究中,我们发现HE-reg在某些条件下可以捕获GxE,因此,我们推导并重新解释了HE-reg的解析解。在GxE的存在下,它导致了联系和关联结果之间的自然差异,如果环境未知,则后者无法捕获GxE。将链接和关联视为对称设计,我们研究了在没有和存在GxE的情况下对称性如何能够和不能保持,因此,我们提出了一对统计检验,对称性测试I和对称性测试II,两者都可以使用汇总统计数据进行测试。测试统计,并对对称测试I和II的统计能力问题进行了研究。增加sib对的数量对于提高检测GxE的统计能力是重要的。
    Haseman-Elston regression (HE-reg) has been known as a classic tool for detecting an additive genetic variance component. However, in this study we find that HE-reg can capture GxE under certain conditions, so we derive and reinterpret the analytical solution of HE-reg. In the presence of GxE, it leads to a natural discrepancy between linkage and association results, the latter of which is not able to capture GxE if the environment is unknown. Considering linkage and association as symmetric designs, we investigate how the symmetry can and cannot hold in the absence and presence of GxE, and consequently we propose a pair of statistical tests, Symmetry Test I and Symmetry Test II, both of which can be tested using summary statistics. Test statistics, and their statistical power issues are also investigated for Symmetry Tests I and II. Increasing the number of sib pairs is important to improve statistical power for detecting GxE.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号