62P10

62P10
  • 文章类型: Journal Article
    在许多生物医学应用中,我们对数值结果高于阈值的预测概率比对结果的预测值更感兴趣.例如,已知抗体水平高于某一阈值可提供对疾病的免疫力,或疾病严重程度评分的阈值可能反映了从症状前疾病阶段到有症状疾病阶段的转变。因此,生物医学研究人员通常将数值转换为二元结果(信息丢失)以进行逻辑回归(概率解释)。我们通过用逻辑回归对二元结果进行建模来解决这种糟糕的统计实践,用线性回归对数值结果进行建模,将预测值从线性回归转换为预测概率,并结合逻辑回归和线性回归的预测概率。分析高维模拟和实验数据,即预测认知障碍的临床数据,我们获得了显著改善的二分法结局预测.因此,所提出的方法有效地将二进制与数值结果相结合,以改善高维设置中的二进制分类。GitHub(https://github.com/rauschenberger/cornet)和CRAN(https://CRAN)上的R包cornet中提供了一个实现。R-project.org/package=cornet)。
    In many biomedical applications, we are more interested in the predicted probability that a numerical outcome is above a threshold than in the predicted value of the outcome. For example, it might be known that antibody levels above a certain threshold provide immunity against a disease, or a threshold for a disease severity score might reflect conversion from the presymptomatic to the symptomatic disease stage. Accordingly, biomedical researchers often convert numerical to binary outcomes (loss of information) to conduct logistic regression (probabilistic interpretation). We address this bad statistical practice by modelling the binary outcome with logistic regression, modelling the numerical outcome with linear regression, transforming the predicted values from linear regression to predicted probabilities, and combining the predicted probabilities from logistic and linear regression. Analysing high-dimensional simulated and experimental data, namely clinical data for predicting cognitive impairment, we obtain significantly improved predictions of dichotomised outcomes. Thus, the proposed approach effectively combines binary with numerical outcomes to improve binary classification in high-dimensional settings. An implementation is available in the R package cornet on GitHub (https://github.com/rauschenberger/cornet) and CRAN (https://CRAN.R-project.org/package=cornet).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文的主要关注点是提供一个灵活的离散模型,该模型可以捕获每种色散(equi-,过分散和过分散)。基于平衡离散化方法,引入了Burr-Hatke分布的新离散版本,并具有部分矩保持性质。介绍了新分布的一些统计性质,并通过考虑计数序列来评估所提出模型的适用性。引入了一种新的整数值自回归(INAR)过程,该过程基于混合的Pegram和二项式细化算子以及离散的Burr-Hatke创新,可以正确地对传染性数据进行建模。通过蒙特卡洛模拟方案,提供了新工艺参数的不同估计方法,并进行了比较。拟议过程的性能通过奥地利的COVID-19每日死亡人数的四个数据集来评估,瑞士,尼日利亚和斯洛文尼亚与一些竞争对手INAR(1)型号相比,以及评估模型的皮尔逊残差分析。拟合优度措施肯定了拟议过程在对所有COVID-19数据集进行建模时的充分性。基本的预测程序被认为是经典的新过程,所有COVID-19数据集的改进的筛子引导和贝叶斯预测方法,结论是贝叶斯预测方法提供了更可靠的结果。
    The main concern of this paper is providing a flexible discrete model that captures every kind of dispersion (equi-, over- and under-dispersion). Based on the balanced discretization method, a new discrete version of Burr-Hatke distribution is introduced with the partial moment-preserving property. Some statistical properties of the new distribution are introduced, and the applicability of proposed model is evaluated by considering counting series. A new integer-valued autoregressive (INAR) process based on the mixing Pegram and binomial thinning operators with discrete Burr-Hatke innovations is introduced, which can model contagious data properly. The different estimation approaches of parameters of the new process are provided and compared through the Monte Carlo simulation scheme. The performance of the proposed process is evaluated by four data sets of the daily death counts of the COVID-19 in Austria, Switzerland, Nigeria and Slovenia in comparison with some competitor INAR(1) models, along with the Pearson residual analysis of the assessing model. The goodness of fit measures affirm the adequacy of the proposed process in modeling all COVID-19 data sets. The fundamental prediction procedures are considered for new process by classic, modified Sieve bootstrap and Bayesian forecasting methods for all COVID-19 data sets, which is concluded that the Bayesian forecasting approach provides more reliable results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    关联测试已被广泛用于研究遗传变异与表型之间的关系。大多数关联测试方法是基于基因型的,即首先估计基因型,然后在估计的基因型和其他变量上回归表型。已经提出了基于没有基因型调用的下一代测序(NGS)数据的直接测试方法,并且在基因型调用不准确的情况下显示出优于基于基因型的方法的优势。已经提出了基于NGS数据的单变量测试,包括我们先前提出的单变量测试方法,即UNC组合方法[1]。我们还使用可以处理连续反应的线性模型框架提出了基于NGS数据的连续表型群体测试方法[2]。在本文中,我们将基于线性模型的框架扩展到基于广义线性模型的框架,以便该方法可以处理其他类型的响应,尤其是在关联研究中常见的二元响应。我们进行了广泛的模拟研究,以评估不同估计器的性能,并将我们的估计器与其相应的基于基因型的方法进行比较。我们发现所有方法都控制了I型错误,对于其他类型的响应,包括二元响应(逻辑回归)和计数响应(泊松回归,尤其是在测序深度较低时,我们的基于NGS数据的测试方法比文献中相应的基于基因型的方法具有更好的性能。总之,我们将以前的线性模型(LM)框架扩展到广义线性模型(GLM)框架,并推导了一组遗传变异的基于NGS数据的测试方法.与我们以前提出的基于LM的方法[2]相比,新的基于GLM的方法可以处理更复杂的响应(例如,二进制响应和计数响应)以及连续响应。我们的方法填补了文献空白,并在文献中显示出优于其相应的基于基因型的方法的优势。
    Association testing has been widely used to study the relationship between genetic variants and phenotypes. Most association testing methods are genotype-based, i.e. first estimate genotype and then regress phenotype on estimated genotype and other variables. Directly testing methods based on next generation sequencing (NGS) data without genotype calling have been proposed and shown advantage over genotype-based methods in the scenarios when genotype calling is not accurate. NGS data-based single-variant testing have been proposed including our previously proposed single-variant testing method, i.e. UNC combo method [1]. NGS data-based group testing methods for continuous phenotype have also been proposed by us using a linear model framework which can handle continuous responses [2]. In this paper, we extend our linear model-based framework to a generalized linear model-based framework so that the methods can handle other types of responses especially binary responses which is commonly-faced in association studies. We have conducted extensive simulation studies to evaluate the performance of different estimators and compare our estimators with their corresponding genotype-based methods. We found that all methods have Type I errors controlled, and our NGS data-based testing methods have better performance than their corresponding genotype-based methods in the literature for other types of responses including binary responses (logistic regression) and count responses (Poisson regression especially when sequencing depth is low. In conclusion, we have extended our previous linear model (LM) framework to a generalized linear model (GLM) framework and derived NGS data-based testing methods for a group of genetic variants. Compared with our previously proposed LM-based methods [2], the new GLM-based methods can handle more complex responses (for example, binary responses and count responses) in addition to continuous responses. Our methods have filled the literature gap and shown advantage over their corresponding genotype-based methods in the literature.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    贝塔分布通常用于对比例值响应变量建模,在纵向研究中经常遇到。在这篇文章中,我们开发了比例值响应的半参数贝塔回归模型,其中,汇总协变量效应并灵活建模,使用潜在协变量的线性组合的可解释单调时变单指数变换。我们利用单指数模型的潜力,它们是有效的降维工具,可以适应广义线性混合模型中的链接函数错误指定。我们的贝叶斯方法结合了比例响应的随机缺失特征,并利用哈密顿蒙特卡罗抽样进行推断。我们探索了我们估计的有限样本频率特性,并通过详细的模拟研究评估了鲁棒性。最后,我们通过应用于肥胖研究的激励纵向数据集来说明我们的方法,该数据记录了身体脂肪的比例。
    Beta distributions are commonly used to model proportion valued response variables, often encountered in longitudinal studies. In this article, we develop semi-parametric Beta regression models for proportion valued responses, where the aggregate covariate effect is summarized and flexibly modeled, using a interpretable monotone time-varying single index transform of a linear combination of the potential covariates. We utilize the potential of single index models, which are effective dimension reduction tools and accommodate link function misspecification in generalized linear mixed models. Our Bayesian methodology incorporates the missing-at-random feature of the proportion response and utilize Hamiltonian Monte Carlo sampling to conduct inference. We explore finite-sample frequentist properties of our estimates and assess the robustness via detailed simulation studies. Finally, we illustrate our methodology via application to a motivating longitudinal dataset on obesity research recording proportion body fat.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • DOI:
    文章类型: Preprint
    只有一种传播的媒介病毒,基本繁殖数包含了滴答滴答(共同喂食)的贡献,tick到主机和主机到tick传输路由。使用两种不同的循环载体传播病毒株,居民和侵入性,假设共同喂养是蜱种群中唯一的传播途径,入侵再生数取决于常微分方程模型系统是否具有中性性质。我们展示了一个简单的模型,两个蜱群感染了一个菌株,居民或侵入性,和一群共同感染的蜱,没有阿里松的中立属性。我们提出了模型替代方案,这些替代方案能够通过包括被同一菌株双重感染的蜱种群来代表新型菌株的入侵潜力。使用下一代方法并通过数值模拟分析了入侵再现数。
    With a single circulating vector-borne virus, the basic reproduction number incorporates contributions from tick-to-tick (co-feeding), tick-to-host and host-to-tick transmission routes. With two different circulating vector-borne viral strains, resident and invasive, and under the assumption that co-feeding is the only transmission route in a tick population, the invasion reproduction number depends on whether the model system of ordinary differential equations possesses the property of neutrality. We show that a simple model, with two populations of ticks infected with one strain, resident or invasive, and one population of co-infected ticks, does not have Alizon\'s neutrality property. We present model alternatives that are capable of representing the invasion potential of a novel strain by including populations of ticks dually infected with the same strain. The invasion reproduction number is analysed with the next-generation method and via numerical simulations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蜂窝网络结构的统计学习,如蛋白质信号通路,是计算系统生物学中的一个热门研究领域。为了从实验数据中获得最大的信息,通常需要开发量身定制的统计方法,而不是应用现成的网络重建方法之一。本文的重点是从免疫印迹蛋白磷酸化数据中学习mTOR蛋白信号通路的结构。在两个实验条件下,在10个非等距时间点测量mTOR途径的8个关键蛋白的11个磷酸化位点。为了进行统计分析,我们提出了一种新的高级分层耦合非齐次动态贝叶斯网络(NH-DBN)模型,我们考虑了各种数据插补方法来处理非等距时间观测。由于缺乏真正的金本位网络,我们建议使用预测概率与留一交叉验证策略相结合,以客观地交叉比较不同NH-DBN模型和数据填补方法的准确性。最后,我们采用模型和数据插补方法的最佳组合来预测mTOR蛋白信号通路的结构。
    Statistical learning of the structures of cellular networks, such as protein signaling pathways, is a topical research field in computational systems biology. To get the most information out of experimental data, it is often required to develop a tailored statistical approach rather than applying one of the off-the-shelf network reconstruction methods. The focus of this paper is on learning the structure of the mTOR protein signaling pathway from immunoblotting protein phosphorylation data. Under two experimental conditions eleven phosphorylation sites of eight key proteins of the mTOR pathway were measured at ten non-equidistant time points. For the statistical analysis we propose a new advanced hierarchically coupled non-homogeneous dynamic Bayesian network (NH-DBN) model, and we consider various data imputation methods for dealing with non-equidistant temporal observations. Because of the absence of a true gold standard network, we propose to use predictive probabilities in combination with a leave-one-out cross validation strategy to objectively cross-compare the accuracies of different NH-DBN models and data imputation methods. Finally, we employ the best combination of model and data imputation method for predicting the structure of the mTOR protein signaling pathway.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来已经提出了几种统计模型,其中包括半参数回归。在医学上,在几种情况下,考虑线性回归进行统计建模是不切实际的,特别是当数据包含与响应变量呈现非线性关系的解释变量时。另一种常见的情况是,当响应变量没有单峰形状时,并且不可能采用属于对称或非对称类的分布。在这种情况下,基于正态分布的扩展,提出了半参数异方差回归。然后,我们证明了该模型对前列腺癌手术成本分析的有用性。预测变量是指两组患者,使得一组接受多模式局部麻醉溶液(抢先目标麻醉溶液),第二组接受神经轴阻滞(脊髓麻醉/传统标准)治疗。还评估了其他相关预测变量,因此,可以对预测变量进行深入的解释,对因变量成本具有非线性影响。采用惩罚最大似然法估计模型参数。新的回归是分析医疗数据的有用统计工具。
    Several statistical models have been proposed in recent years, among them is the semiparametric regression. In medicine, there are several situations in which it is impracticable to consider a linear regression for statistical modeling, especially when the data contain explanatory variables that present a nonlinear relationship with the response variable. Another common situation is when the response variable does not have a unimodal shape, and it is not possible to adopt distributions belonging to the symmetric or asymmetric classes. In this context, a semiparametric heteroskedastic regression is proposed based on an extension of the normal distribution. Then, we show the usefulness of this model to analyze the cost of prostate cancer surgery. The predictor variables refer to two groups of patients such that one group receives a multimodal local anesthetic solution (Preemptive Target Anesthetic Solution) and the second group is treated with neuraxial blockade (spinal anesthesia/traditional standard). The other relevant predictor variables are also evaluated, thus allowing for the in-depth interpretation of the predictor variables with a nonlinear effect on the dependent variable cost. The penalized maximum likelihood method is adopted to estimate the model parameters. The new regression is a useful statistical tool for analyzing medical data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当前按各州对成人肥胖症患病率进行聚类的方法集中在创建美国给定年份的肥胖症患病率的单个图。为了制定有针对性的区域卫生政策,比较这些不同年份的地图可能会限制我们对州和地区肥胖患病率随时间进展的理解。在本应用笔记中,我们采用非参数动态时间规整方法对肥胖患病率的纵向时间序列进行聚类。该方法捕获时间序列之间的超前和滞后关系,作为时间对齐的一部分,使我们能够制作一张地图,捕获1990年至2019年美国肥胖患病率的区域和时间集群。我们确定了美国肥胖患病率的六个地区,并基于ARIMA模型预测了未来肥胖患病率的估计值。
    Current methods for clustering adult obesity prevalence by state focus on creating a single map of obesity prevalence for a given year in the United States. Comparing these maps for different years may limit our understanding of the progression of state and regional obesity prevalence over time for the purpose of developing targeted regional health policies. In this application note, we adopt the non-parametric Dynamic Time Warping method for clustering longitudinal time series of obesity prevalence by state. This method captures the lead and lag relationship between the time series as part of the temporal alignment, allowing us to produce a single map that captures the regional and temporal clusters of obesity prevalence from 1990 to 2019 in the United States. We identify six regions of obesity prevalence in the United States and forecast future estimates of obesity prevalence based on ARIMA models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们研究了形状约束方法在早期疫苗试验中评估免疫反应谱的性能。这项工作的动机问题涉及对HVTN097和HVTN100HIV疫苗试验中产生的第一和第二可变环(V1V2区)的IgG结合免疫应答进行定量和比较。我们考虑单峰和对数凹形状约束方法来比较两种疫苗的免疫谱,这是合理的,因为数据支持免疫反应的潜在密度可能具有这些形状。为此,我们开发了两种密度之间的Hellinger平方距离的随机优势和形状约束插件估计的新型形状约束测试。我们的技术要么是无参数调整,或仅依赖于一个调谐参数,但是它们的性能更好(随机优势的测试)或与非参数方法(平方Hellinger距离的估计器)相当。在分析必须预先指定和可重复的临床环境中,对调整参数的最小依赖性是特别期望的。我们的方法得到了理论结果和仿真研究的支持。
    We study the performance of shape-constrained methods for evaluating immune response profiles from early-phase vaccine trials. The motivating problem for this work involves quantifying and comparing the IgG binding immune responses to the first and second variable loops (V1V2 region) arising in HVTN 097 and HVTN 100 HIV vaccine trials. We consider unimodal and log-concave shape-constrained methods to compare the immune profiles of the two vaccines, which is reasonable because the data support that the underlying densities of the immune responses could have these shapes. To this end, we develop novel shape-constrained tests of stochastic dominance and shape-constrained plug-in estimators of the squared Hellinger distance between two densities. Our techniques are either tuning parameter free, or rely on only one tuning parameter, but their performance is either better (the tests of stochastic dominance) or comparable with the nonparametric methods (the estimators of the squared Hellinger distance). The minimal dependence on tuning parameters is especially desirable in clinical contexts where analyses must be prespecified and reproducible. Our methods are supported by theoretical results and simulation studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在本文中,我们提出了一种有效的统计方法(称为“自适应资源分配CUSUM”),以在有限的采样资源下可靠有效地检测热点。我们的主要思想是将多臂强盗(MAB)和变化点检测方法相结合,以平衡热点检测资源分配的探索和开发。Further,贝叶斯加权更新用于更新感染率的后验分布。然后,置信上限(UCB)用于资源分配和规划。最后,CUSUM监视统计信息以检测变更点以及变更位置。对于性能评估,将该方法与文献中的几种基准方法的性能进行了比较,结果表明该算法能够实现更低的检测延迟和更高的检测精度。最后,在华盛顿州华盛顿州的县级每日阳性COVID-19病例的真实案例研究中,该方法应用于热点检测),并证明了在非常有限的分布样本中的有效性。
    In this paper, we present an efficient statistical method (denoted as \'Adaptive Resources Allocation CUSUM\') to robustly and efficiently detect the hotspot with limited sampling resources. Our main idea is to combine the multi-arm bandit (MAB) and change-point detection methods to balance the exploration and exploitation of resource allocation for hotspot detection. Further, a Bayesian weighted update is used to update the posterior distribution of the infection rate. Then, the upper confidence bound (UCB) is used for resource allocation and planning. Finally, CUSUM monitoring statistics to detect the change point as well as the change location. For performance evaluation, we compare the performance of the proposed method with several benchmark methods in the literature and showed the proposed algorithm is able to achieve a lower detection delay and higher detection precision. Finally, this method is applied to hotspot detection in a real case study of county-level daily positive COVID-19 cases in Washington State WA) and demonstrates the effectiveness with very limited distributed samples.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号