Information criterion

信息准则
  • 文章类型: Letter
    临床试验终点通常是有限的结果评分(BOS),它们是在有限区间内具有限制值的变量。常见的分析方法可能会将数据视为连续的,明确,或者两者的混合物。BOS数据的出现同时是连续的和分类的,很容易导致药物计量学中关于模型评估的适当领域以及可以比较数据可能性的情况的混淆。本评论旨在阐明这些基本问题并促进适当的药物计量分析。
    Clinical trial endpoints are often bounded outcome scores (BOS), which are variables having restricted values within finite intervals. Common analysis approaches may treat the data as continuous, categorical, or a mixture of both. The appearance of BOS data being simultaneously continuous and categorical easily leads to confusions in pharmacometrics regarding the appropriate domain for model evaluation and the circumstances under which data likelihoods can be compared. This commentary aims to clarify these fundamental issues and facilitate appropriate pharmacometric analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由于水温升高,许多依赖冷水的水生生物正在经历栖息地和种群减少。确定驱动局部和区域河流热状况的机制有助于在生态相关尺度上恢复。流域内部和流域之间的河流温度在空间和时间上都有所不同。我们开发了一个建模过程,以确定河流温度的驱动因素与代表景观的协变量之间的统计关系,气候,和管理相关的过程。在生长季节期间,在美国西北太平洋的3个研究区域(5月[开始],八月[最温暖],9月[结束])。在所有月份和学习系统中,具有最高相对重要性的协变量代表自然景观(海拔[1st],集水区[第三],主渠道坡度[5th])和气候协变量(平均月气温[2th]和流量[4th])。两个管理协变量(地下水使用[第6]和河岸阴影[第7])也具有很高的相对重要性。在整个生长季节(对于所有盆地),5月的局部坡度具有较高的相对重要性,但在8月和9月过渡到区域主渠道坡度协变量。此建模过程确定了流温度驱动器之间的区域相似且局部唯一的关系。与管理相关的协变量的相对重要性高,建议每个系统采取潜在的恢复措施。
    Many cold-water dependent aquatic organisms are experiencing habitat and population declines from increasing water temperatures. Identifying mechanisms which drive local and regional stream thermal regimes facilitates restoration at ecologically relevant scales. Stream temperatures vary spatially and temporally both within and among river basins. We developed a modeling process to identify statistical relationships between drivers of stream temperature and covariates representing landscape, climate, and management-related processes. The modeling process was tested in 3 study areas of the Pacific Northwest USA during the growing season (May [start], August [warmest], September [end]). Across all months and study systems, covariates with the highest relative importance represented the physical landscape (elevation [1st], catchment area [3rd], main channel slope [5th]) and climate covariates (mean monthly air temperature [2nd] and discharge [4th]). Two management covariates (ground water use [6th] and riparian shade [7th]) also had high relative importance. Across the growing season (for all basins) local reach slope had high relative importance in May, but transitioned to a regional main channel slope covariate in August and September. This modeling process identified regionally similar and locally unique relationships among drivers of stream temperature. High relative importance of management-related covariates suggested potential restoration actions for each system.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    揭示传感器真实非线性的抗噪声线性化模型对于从传感电子设备捕获的信号中检索精确的物理位移至关重要。在本文中,我们提出了一种新的信息驱动的平滑样条线性化方法,创新地将一个新的和三个标准的信息标准集成到一个平滑样条中,用于高精度位移传感器的线性化。采用理论分析和蒙特卡罗模拟,所提出的线性化方法被证明优于传统的多项式和样条线性化方法,用于具有10-5级低噪声范围比的高精度位移传感器。与多项式模型和非平滑三次样条相比,在两种不同类型的位移传感器上进行了验证实验,以衡量所提出方法的性能。结果表明,与其他线性化方法相比,具有新的改进的Akaike信息准则的方法脱颖而出,与标准多项式模型相比,可以将残差非线性改善50%以上。在通过所提出的方法线性化后,残差非线性低至±0.0311%F.S.(满量程),对于1.5毫米范围的彩色共焦位移传感器,和±0.0047%F.S.,为100毫米范围的激光三角位移传感器。
    A noise-resistant linearization model that reveals the true nonlinearity of the sensor is essential for retrieving accurate physical displacement from the signals captured by sensing electronics. In this paper, we propose a novel information-driven smoothing spline linearization method, which innovatively integrates one new and three standard information criterions into a smoothing spline for the high-precision displacement sensors\' linearization. Using theoretical analysis and Monte Carlo simulation, the proposed linearization method is demonstrated to outperform traditional polynomial and spline linearization methods for high-precision displacement sensors with a low noise to range ratio in the 10-5 level. Validation experiments were carried out on two different types of displacement sensors to benchmark the performance of the proposed method compared to the polynomial models and the the non-smoothing cubic spline. The results show that the proposed method with the new modified Akaike Information Criterion stands out compared to the other linearization methods and can improve the residual nonlinearity by over 50% compared to the standard polynomial model. After being linearized via the proposed method, the residual nonlinearities reach as low as ±0.0311% F.S. (Full Scale of Range), for the 1.5 mm range chromatic confocal displacement sensor, and ±0.0047% F.S., for the 100 mm range laser triangulation displacement sensor.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    多年来,吸附作为将污染物从液相中分离出来的最经济有效的方法之一,已经引起了相当大的关注。对吸附机理的全面了解需要几个关键步骤,包括吸附剂表征,批量和柱吸附试验,预定义的动力学和等温线模型的拟合,细致的热力学分析。这些结合的努力有助于提供对吸附现象的复杂作用的清晰和见解。然而,每年在该领域发表的大量文献充斥着不明智的模型选择和错误的参数分析。因此,本文的目的是为正确使用这些众多的动能,等温线,和固定床模型在各种应用。已经进行了彻底的审查,包含超过45个动力学模型,70个等温线模型,和现有的45种固定床型号,它们的分类是根据它们各自的吸附机理确定的。此外,提供了修改固定床模型的五种通用方法。物理意义,假设,并详细讨论了模型的相互转换关系,以及用于评估其有效性的信息标准。除了常用的活化能和吉布斯能分析,总结了场地能量分布的计算方法。
    During the years, adsorption has garnered considerable attention being one of the most cost-effective and efficient methods for separating contaminants out of liquid phase. A comprehensive understanding of adsorption mechanisms entails several crucial steps, including adsorbent characterization, batch and column adsorption tests, fitting of predefined kinetic and isotherm models, and meticulous thermodynamic analysis. These combined efforts serve to provide clarity and insights into the intricate workings of adsorption phenomena. However, the vast amount of literature published in the field each year is riddled with ill-considered model selections and incorrect parameter analyses. Therefore, the aim of this paper is to establish guidelines for the proper employment of these numerous kinetic, isotherm, and fixed-bed models in various applications. A thorough review has been undertaken, encompassing more than 45 kinetic models, 70 isotherm models, and 45 fixed bed models available hitherto, with their classification determined based on the adsorption mechanisms expounded within each of them. Moreover, five general approaches for modifying fixed-bed models were provided. The physical meanings, assumptions, and interconversion relationships of the models were discussed in detail, along with the information criterion used to evaluate their validity. In addition to commonly used activation energy and Gibbs energy analysis, the methods for calculating site energy distribution were also summarized.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:正确识别空间疾病集群是公共卫生和流行病学的基本关注点。在空间流行病学和疾病监测中,空间扫描统计量广泛用于检测空间疾病簇。搜索空间簇时,许多研究默认将最大报告簇大小(MRCS)设置为总人口的50%。然而,此默认设置有时可以报告比真实集群大的集群,其中包括不太相关的地区。对于泊松来说,伯努利,序数,正常,和指数模型,基尼系数已经开发出来,以优化MRCS。然而,多项式模型没有可用的度量。
    结果:我们提出了空间聚类信息标准(SCIC)的两个版本,用于为基于多项式的空间扫描统计量选择最佳MRCS值。我们的模拟研究表明,SCIC提高了报告真实聚类的准确性。对韩国社区健康调查(KCHS)数据的分析进一步表明,与默认设置相比,我们的方法可以识别出更有意义的小集群。
    结论:我们的方法着重于在使用多项式模型时通过优化MRCS值来提高空间扫描统计量的性能。在公共卫生和疾病监测中,所提出的方法可以用于为多项数据提供更准确和有意义的空间聚类检测,如疾病亚型。
    BACKGROUND: Correctly identifying spatial disease cluster is a fundamental concern in public health and epidemiology. The spatial scan statistic is widely used for detecting spatial disease clusters in spatial epidemiology and disease surveillance. Many studies default to a maximum reported cluster size (MRCS) set at 50% of the total population when searching for spatial clusters. However, this default setting can sometimes report clusters larger than true clusters, which include less relevant regions. For the Poisson, Bernoulli, ordinal, normal, and exponential models, a Gini coefficient has been developed to optimize the MRCS. Yet, no measure is available for the multinomial model.
    RESULTS: We propose two versions of a spatial cluster information criterion (SCIC) for selecting the optimal MRCS value for the multinomial-based spatial scan statistic. Our simulation study suggests that SCIC improves the accuracy of reporting true clusters. Analysis of the Korea Community Health Survey (KCHS) data further demonstrates that our method identifies more meaningful small clusters compared to the default setting.
    CONCLUSIONS: Our method focuses on improving the performance of the spatial scan statistic by optimizing the MRCS value when using the multinomial model. In public health and disease surveillance, the proposed method can be used to provide more accurate and meaningful spatial cluster detection for multinomial data, such as disease subtypes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    围绕模型选择范式的各种争论很重要,但是代替共识,显然需要对现有方法进行更深入的了解,至少在统计数据和模型选择工具的最终用户中。在生态文学中,Akaike信息标准(AIC)主导了模型选择实践,虽然这是一个相对简单的概念,我们认为围绕它的应用存在一些常见的误解。在解释和报告AIC模型表时,同事和学生之间出现了两个令人惊讶的规律性问题。第一个与\'假装\'变量的问题有关,特别是对这意味着什么的混乱理解。第二个与p值有关,以及使用AIC时构成统计支持的内容。有大量的技术文献描述了AIC以及p值与AIC差异之间的关系。这里,我们补充了这种技术处理,并使用模拟来围绕这些重要概念发展一些直觉。为此,我们的目标是在使用时促进更好的统计实践,使用AIC时选择的解释和报告模型。
    The various debates around model selection paradigms are important, but in lieu of a consensus, there is a demonstrable need for a deeper appreciation of existing approaches, at least among the end-users of statistics and model selection tools. In the ecological literature, the Akaike information criterion (AIC) dominates model selection practices, and while it is a relatively straightforward concept, there exists what we perceive to be some common misunderstandings around its application. Two specific questions arise with surprising regularity among colleagues and students when interpreting and reporting AIC model tables. The first is related to the issue of \'pretending\' variables, and specifically a muddled understanding of what this means. The second is related to p-values and what constitutes statistical support when using AIC. There exists a wealth of technical literature describing AIC and the relationship between p-values and AIC differences. Here, we complement this technical treatment and use simulation to develop some intuition around these important concepts. In doing so we aim to promote better statistical practices when it comes to using, interpreting and reporting models selected when using AIC.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在统计推断中,不确定性是未知的,所有模型都是错误的。也就是说,建立统计模型和先验分布的人同时知道两者都是虚构的候选人。为了研究这种情况,已经建立了统计措施,比如交叉验证,信息标准和边际可能性;然而,当统计模型参数化不足或过度时,它们的数学性质尚未完全阐明。我们介绍了未知不确定性的贝叶斯统计数学理论的一个地方,阐明了交叉验证的一般性质,信息标准和边际可能性,即使未知的数据生成过程无法通过模型实现,或者即使后验分布无法通过任何正态分布来近似。因此,它为无法相信任何特定模型和先验的人提供了有用的观点。本文由三部分组成。第一个是新的结果,而第二个和第三个是以前众所周知的新实验结果。我们表明,存在一个比留一交叉验证更精确的泛化损失估计器,存在比贝叶斯信息准则更准确的边际似然近似,泛化损失和边际似然的最优超参数是不同的。本文是主题问题“贝叶斯推理:挑战,观点,和前景。
    In statistical inference, uncertainty is unknown and all models are wrong. That is to say, a person who makes a statistical model and a prior distribution is simultaneously aware that both are fictional candidates. To study such cases, statistical measures have been constructed, such as cross validation, information criteria and marginal likelihood; however, their mathematical properties have not yet been completely clarified when statistical models are under- or over-parametrized. We introduce a place of mathematical theory of Bayesian statistics for unknown uncertainty, which clarifies general properties of cross validation, information criteria and marginal likelihood, even if an unknown data-generating process is unrealizable by a model or even if the posterior distribution cannot be approximated by any normal distribution. Hence it gives a helpful standpoint for a person who cannot believe in any specific model and prior. This paper consists of three parts. The first is a new result, whereas the second and third are well-known previous results with new experiments. We show there exists a more precise estimator of the generalization loss than leave-one-out cross validation, there exists a more accurate approximation of marginal likelihood than Bayesian information criterion, and the optimal hyperparameters for generalization loss and marginal likelihood are different. This article is part of the theme issue \'Bayesian inference: challenges, perspectives, and prospects\'.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    该研究的目的是提高机器学习决策支持系统(DSS)在组织形态学基础上诊断肿瘤病理学的功能效率。提出了诊断DSS的分层信息极限机器学习方法。该方法是在功能方法的框架内开发的,用于在形成和接受分类决策时对自然智力认知过程进行建模。这种方法,与神经元结构相反,通过扩展表征组织形态学不同结构的识别类字母,允许诊断DSS适应组织学成像的任意条件和系统的再训练灵活性。此外,在几何方法中建立的决定性规则实际上对于诊断特征空间的多维性是不变的。开发的方法允许创建信息,算法,以及组织学医师自动化工作场所的软件,用于诊断不同起源的病理。机器学习方法在诊断乳腺癌的示例上实现。
    The aim of the study is to increase the functional efficiency of machine learning decision support system (DSS) for the diagnosis of oncopathology on the basis of tissue morphology. The method of hierarchical information-extreme machine learning of diagnostic DSS is offered. The method is developed within the framework of the functional approach to modeling of natural intelligence cognitive processes at formation and acceptance of classification decisions. This approach, in contrast to neuronal structures, allows diagnostic DSS to adapt to arbitrary conditions of histological imaging and flexibility in retraining the system by expanding the recognition classes alphabet that characterize different structures of tissue morphology. In addition, the decisive rules built within the geometric approach are practically invariant to the multidimensionality of the diagnostic features space. The developed method allows to create information, algorithmic, and software of the automated workplace of the histologist for diagnosing oncopathologies of different genesis. The machine learning method is implemented on the example of diagnosing breast cancer.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    参数估计精度和平均样本数(ASN)减少对于提高序贯假设检验中的目标检测性能很重要。多输入多输出(MIMO)雷达可以通过波形分集实现参数估计精度和ASN降低之间的平衡。在这项研究中,为了提高多目标检测性能,提出了一种基于两阶段信息准则的波形设计方法。在第一阶段,在信噪比(SNR)的约束下,基于单假设互信息(MI)最大化准则设计波形以估计目标参数。在第二阶段,目标函数是基于多假设后验概率之间的MI最小化和Kullback-Leibler散度(KLD)最大化准则设计的,从第一阶段参数估计的波形库中选取波形。此外,提出了一种用于多目标检测的自适应波形设计算法框架。仿真结果表明,基于两阶段信息准则的波形设计可以快速检测目标方向。此外,基于双假设MI最小化准则的波形设计可以提高参数估计性能,而基于双假设KLD最大化准则的设计可以提高目标检测性能。
    Parameter estimation accuracy and average sample number (ASN) reduction are important to improving target detection performance in sequential hypothesis tests. Multiple-input multiple-output (MIMO) radar can balance between parameter estimation accuracy and ASN reduction through waveform diversity. In this study, we propose a waveform design method based on a two-stage information criterion to improve multi-target detection performance. In the first stage, the waveform is designed to estimate the target parameters based on the criterion of single-hypothesis mutual information (MI) maximization under the constraint of the signal-to-noise ratio (SNR). In the second stage, the objective function is designed based on the criterion of MI minimization and Kullback-Leibler divergence (KLD) maximization between multi-hypothesis posterior probabilities, and the waveform is chosen from the waveform library of the first-stage parameter estimation. Furthermore, an adaptive waveform design algorithm framework for multi-target detection is proposed. The simulation results reveal that the waveform design based on the two-stage information criterion can rapidly detect the target direction. In addition, the waveform design based on the criterion of dual-hypothesis MI minimization can improve the parameter estimation performance, whereas the design based on the criterion of dual-hypothesis KLD maximization can improve the target detection performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    模型选择通常是隐含的:当执行方差分析时,假设正态分布是数据的良好模型;拟合调谐曲线意味着加法和乘法缩放器描述了神经元的行为;甚至计算平均值也隐式地假设数据是从具有有限的第一统计矩的分布中采样的:平均值。模型选择可能是明确的,当目的是测试一个模型是否比竞争模型更好地描述数据时。作为一个特例,聚类算法识别数据中具有相似属性的组。它们被广泛用于从刺突分选到细胞类型鉴定到基因表达分析。我们从统计学家的角度讨论模型选择和聚类技术,揭示了背后的假设,以及支配各种方法的逻辑。我们还展示了重要的神经科学应用,并提供神经科学家如何最好地利用模型选择算法以及应该避免哪些错误的建议。
    Model selection is often implicit: when performing an ANOVA, one assumes that the normal distribution is a good model of the data; fitting a tuning curve implies that an additive and a multiplicative scaler describes the behavior of the neuron; even calculating an average implicitly assumes that the data were sampled from a distribution that has a finite first statistical moment: the mean. Model selection may be explicit, when the aim is to test whether one model provides a better description of the data than a competing one. As a special case, clustering algorithms identify groups with similar properties within the data. They are widely used from spike sorting to cell type identification to gene expression analysis. We discuss model selection and clustering techniques from a statistician\'s point of view, revealing the assumptions behind, and the logic that governs the various approaches. We also showcase important neuroscience applications and provide suggestions how neuroscientists could put model selection algorithms to best use as well as what mistakes should be avoided.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号