Statistical model

统计模型
  • 文章类型: Case Reports
    两个概率基因分型(PG)程序,STRMix™和TrueAllele™,被用来评估联邦刑事案件中相同DNA证据的强度,结果截然不同。对于STRMix,支持非贡献者假设的报告似然比为24;对于TrueAllele,其范围为120万至1670万,取决于参考人口。本案例报告旨在解释为什么这两个程序产生不同的结果,并考虑这些程序的可靠性和可信度的差异。它使用逐位分解来将不同的结果追溯到建模参数和方法的细微差别,分析阈值,和混合比,以及TrueAllele使用临时程序在某些基因座上分配LR。这些发现说明了PG分析在多大程度上依赖于可竞争假设的晶格,强调使用已知来源的测试样本严格验证PG程序的重要性,这些样本紧密复制了证据样本的特征。本文还指出了STRMix和TrueAllele结果在报告和证词中常规呈现方式的误导性方面,并呼吁澄清法医报告标准以解决这些问题。
    Two probabilistic genotyping (PG) programs, STRMix™ and TrueAllele™, were used to assess the strength of the same item of DNA evidence in a federal criminal case, with strikingly different results. For STRMix, the reported likelihood ratio in favor of the non-contributor hypothesis was 24; for TrueAllele it ranged from 1.2 million to 16.7 million, depending on the reference population. This case report seeks to explain why the two programs produced different results and to consider what the difference tells us about the reliability and trustworthiness of these programs. It uses a locus-by-locus breakdown to trace the differing results to subtle differences in modeling parameters and methods, analytic thresholds, and mixture ratios, as well as TrueAllele\'s use of an ad hoc procedure for assigning LRs at some loci. These findings illustrate the extent to which PG analysis rests on a lattice of contestable assumptions, highlighting the importance of rigorous validation of PG programs using known-source test samples that closely replicate the characteristics of evidentiary samples. The article also points out misleading aspects of the way STRMix and TrueAllele results are routinely presented in reports and testimony and calls for clarification of forensic reporting standards to address those problems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Systematic Review
    背景:麻风病是由麻风分枝杆菌引起的传染病,如果不被发现,仍然是可预防的残疾的来源。病例检测延迟是社区在阻断传播和预防残疾方面取得进展的重要流行病学指标。然而,没有标准的方法来有效地分析和解释这类数据。在这项研究中,我们旨在评估麻风病病例检测延迟数据的特征,并根据最佳拟合分布类型为检测延迟的变异性选择合适的模型。
    方法:评估了两组麻风病病例检测延迟数据:来自埃塞俄比亚高流行地区麻风病暴露后预防(PEP4LEP)研究的181名患者,莫桑比克,和坦桑尼亚;以及8个低流行国家的87名个人的自我报告延误,作为系统文献综述的一部分。将贝叶斯模型拟合到每个数据集以评估哪种概率分布(对数正态,gamma或Weibull)最好地描述了使用留一交叉验证观察到的案例检测延迟的变化,并估计个别因素的影响。
    结果:对于这两个数据集,检测延迟最好用对数正态分布结合协变量年龄来描述,性别和麻风病亚型[联合模型的预期对数预测密度(ELPD):-1123.9]。与少杆菌(PB)麻风病相比,多杆菌(MB)麻风病患者的延误时间更长,相对差异为1.57[95%贝叶斯可信区间(BCI):1.14-2.15]。与系统评价中自我报告的患者延迟相比,PEP4LEP队列中的患者的病例检测延迟为1.51(95%BCI:1.08-2.13)倍。
    结论:此处提供的对数正态模型可用于比较麻风病病例检测延迟数据集,包括PEP4LEP,其中主要结局指标是减少病例检测延迟。我们建议在麻风病和其他皮肤NTD领域具有相似结果的研究中,应用这种建模方法来测试不同的概率分布和协变量效应。
    BACKGROUND: Leprosy is an infectious disease caused by Mycobacterium leprae and remains a source of preventable disability if left undetected. Case detection delay is an important epidemiological indicator for progress in interrupting transmission and preventing disability in a community. However, no standard method exists to effectively analyse and interpret this type of data. In this study, we aim to evaluate the characteristics of leprosy case detection delay data and select an appropriate model for the variability of detection delays based on the best fitting distribution type.
    METHODS: Two sets of leprosy case detection delay data were evaluated: a cohort of 181 patients from the post exposure prophylaxis for leprosy (PEP4LEP) study in high endemic districts of Ethiopia, Mozambique, and Tanzania; and self-reported delays from 87 individuals in 8 low endemic countries collected as part of a systematic literature review. Bayesian models were fit to each dataset to assess which probability distribution (log-normal, gamma or Weibull) best describes variation in observed case detection delays using leave-one-out cross-validation, and to estimate the effects of individual factors.
    RESULTS: For both datasets, detection delays were best described with a log-normal distribution combined with covariates age, sex and leprosy subtype [expected log predictive density (ELPD) for the joint model: -1123.9]. Patients with multibacillary (MB) leprosy experienced longer delays compared to paucibacillary (PB) leprosy, with a relative difference of 1.57 [95% Bayesian credible interval (BCI): 1.14-2.15]. Those in the PEP4LEP cohort had 1.51 (95% BCI: 1.08-2.13) times longer case detection delay compared to the self-reported patient delays in the systematic review.
    CONCLUSIONS: The log-normal model presented here could be used to compare leprosy case detection delay datasets, including PEP4LEP where the primary outcome measure is reduction in case detection delay. We recommend the application of this modelling approach to test different probability distributions and covariate effects in studies with similar outcomes in the field of leprosy and other skin-NTDs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:状态行为危险因素监测系统(BRFSS)的主要目标是对各种人群健康结果产生可靠的州级估计。已将用于小区域估计的多级回归和后分层(MRP)方法应用于500个城市项目,以使用国家BRFSS数据在城市一级和人口普查区一级提供人口估计。迄今为止,MRP尚未应用于任何州BRFSS以在本地地理区域产生健康数据。此外,单年BRFSS的使用可能会在小区域估计(SAE)中产生暂时的不一致。使用蒙特卡罗模拟的SAE的预测标准误差(SE)和置信区间(CI)可能被大大低估或高估。
    方法:通过扩展当前的MRP方法并将参数自举方法应用于康涅狄格州BRFSS(CTBRFSS),我们能够为康涅狄格州的县和城镇生产SAE以及SAE的SE和CI。我们还将该模型应用于5年CTBRFSS(2011-2015),旨在提高SAE的临时一致性。
    结果:使用SE和CI对城镇的六个选定人口健康指标进行了单年和五年估计,县和州一级。通过与单年和五年直接BRFSS调查(2011-2015)进行比较,对基于模型的SAE进行了内部评估。当外部数据可用时,SAE也进行外部验证。
    结论:基于模型的SAE是有效的,可用于使用单状态BRFSS数据表征局部地理变化。
    OBJECTIVE: The main objective of state behavioral risk factor surveillance system (BRFSS) is to produce reliable state-level estimates of various population health outcomes. A multilevel Regression and Post-stratification (MRP) methodology for small area estimation has been applied to the 500 Cities Project to provide population estimates at both city-level and census tract-level using national BRFSS data. To date, MRP has not been applied to any state BRFSS to produce health data at local geographic areas. In addition, the use of single year BRFSS might produce temporary inconsistency in small area estimates (SAEs). The predicted standard errors (SEs) and confidence intervals (CIs) of SAEs using Monte Carlo simulation could be substantially underestimated or overestimated.
    METHODS: By extending the current MRP approach and applying a parametric bootstrapping approach to Connecticut BRFSS (CT BRFSS), we were able to produce SAEs as well as SEs and CIs of SAEs for Connecticut counties and towns. We also applied this model to 5-year CT BRFSS (2011-2015) with an aim to improve the temporary consistency of SAEs.
    RESULTS: Both single-year and 5-year estimates with SEs and CIs were generated for six selected population health indicators at town, county and state levels. Model-based SAEs were internally evaluated by comparing to single-year and 5-year direct BRFSS survey (2011-2015). SAEs were also externally validated when external data were available.
    CONCLUSIONS: Model-based SAEs are valid and could be used to characterize local geographic variations using single state BRFSS data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    手术时间的实时预测可以告知围手术期决策并降低手术成本。我们开发了一种机器学习方法,该方法连续结合术前和术中信息来预测手术持续时间。
    从2019年3月1日至2019年10月31日进行的手术的麻醉记录中检索术前(例如手术名称)和术中(例如药物和生命体征)变量。开发了模块化人工神经网络,并将其与贝叶斯方法和计划的手术时间进行了比较。连续排序概率得分(CRPS)被用作时间误差的量度以评估模型的准确性。为了评估临床表现,在识别超过15:00(通常计划的班次结束)的情况下,评估每种方法的准确性,从而确定避免加班劳动力成本的机会。
    分析包括在8家医院进行的70826例病例。模块化人工神经网络具有最低的时间误差(CRPS:平均值=13.8;标准偏差=35.4分钟),其显着更好(平均差=6.4分钟[95%置信区间:6.3-6.5];P<0.001)比贝叶斯方法。与贝叶斯方法(80%)和使用预定持续时间的幼稚方法(78%)相比,模块化人工神经网络在识别超过15:00的手术室方面也具有最高的准确性(1小时前的准确性=89%)。
    使用术前和术中数据的实时神经网络模型的性能明显优于贝叶斯方法或计划持续时间,提供机会,以避免加班劳动成本,并通过提供良好的实时信息,围手术期决策支持降低手术成本。
    Real-time prediction of surgical duration can inform perioperative decisions and reduce surgical costs. We developed a machine learning approach that continuously incorporates preoperative and intraoperative information for forecasting surgical duration.
    Preoperative (e.g. procedure name) and intraoperative (e.g. medications and vital signs) variables were retrieved from anaesthetic records of surgeries performed between March 1, 2019 and October 31, 2019. A modular artificial neural network was developed and compared with a Bayesian approach and the scheduled surgical duration. Continuous ranked probability score (CRPS) was used as a measure of time error to assess model accuracy. For evaluating clinical performance, accuracy for each approach was assessed in identifying cases that ran beyond 15:00 (commonly scheduled end of shift), thus identifying opportunities to avoid overtime labour costs.
    The analysis included 70 826 cases performed at eight hospitals. The modular artificial neural network had the lowest time error (CRPS: mean=13.8; standard deviation=35.4 min), which was significantly better (mean difference=6.4 min [95% confidence interval: 6.3-6.5]; P<0.001) than the Bayesian approach. The modular artificial neural network also had the highest accuracy in identifying operating theatres that would overrun 15:00 (accuracy at 1 h prior=89%) compared with the Bayesian approach (80%) and a naïve approach using the scheduled duration (78%).
    A real-time neural network model using preoperative and intraoperative data had significantly better performance than a Bayesian approach or scheduled duration, offering opportunities to avoid overtime labour costs and reduce the cost of surgery by providing superior real-time information for perioperative decision support.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    有监督的机器学习(ML)在医疗保健文献中得到了体现,研究结果经常使用准确性等指标报告。灵敏度,特异性,召回,或F1得分。尽管每个指标都提供了不同的性能视角,它们仍然是整个样本的总体衡量标准,忽略每个病例或患者的独特性。直觉上,我们知道不是所有的情况都是平等的,但是目前的评估方法没有考虑案例难度。
    更多基于案例的,全面的方法是必要的,以评估监督ML的结果,并构成本研究的理由。这项研究旨在证明如何使用项目反应理论(IRT)根据每个案例的分类难度对数据进行分层,独立于兴趣的结果度量(例如,准确性)。这种分层允许ML分类器的评估采用分布而不是单个标量值的形式。
    两个大的,公共重症监护病房数据集,重症监护III和电子重症监护病房的医疗信息集市,被用来展示这种预测死亡率的方法。对于每个数据集,绘制了一个平衡样本(分别为n=8078和n=21,940)和一个不平衡样本(分别为n=12,117和n=32,910)。使用2参数逻辑模型为每种情况提供分数。演示中使用了几种ML算法,根据与健康相关的特征对病例进行分类:逻辑回归,线性判别分析,K-最近的邻居,决策树,天真的贝叶斯,和神经网络。广义线性混合模型分析用于评估案例难度层的影响,ML算法,以及它们之间在预测准确性方面的相互作用。
    结果显示,对病例困难地层有显著影响(P<.001),ML算法,以及它们在预测准确性方面的相互作用,并说明所有分类器在更容易分类的情况下表现更好,并且总体上神经网络表现最好。重要的相互作用表明,落在最艰苦的阶层中的病例应该通过逻辑回归来处理,线性判别分析,决策树,或神经网络,但不是通过朴素贝叶斯或K近邻。已经报道了用于ML分类的常规度量用于方法比较。
    此演示表明,使用IRT是理解提供给ML算法的数据的可行方法,独立于结果衡量标准,并强调分类器如何很好地区分不同难度的情况。该方法解释了哪些特征指示健康状态以及为什么。它使最终用户能够定制适合患者的难度级别的分类器,以进行个性化医疗。
    Supervised machine learning (ML) is being featured in the health care literature with study results frequently reported using metrics such as accuracy, sensitivity, specificity, recall, or F1 score. Although each metric provides a different perspective on the performance, they remain to be overall measures for the whole sample, discounting the uniqueness of each case or patient. Intuitively, we know that all cases are not equal, but the present evaluative approaches do not take case difficulty into account.
    A more case-based, comprehensive approach is warranted to assess supervised ML outcomes and forms the rationale for this study. This study aims to demonstrate how the item response theory (IRT) can be used to stratify the data based on how difficult each case is to classify, independent of the outcome measure of interest (eg, accuracy). This stratification allows the evaluation of ML classifiers to take the form of a distribution rather than a single scalar value.
    Two large, public intensive care unit data sets, Medical Information Mart for Intensive Care III and electronic intensive care unit, were used to showcase this method in predicting mortality. For each data set, a balanced sample (n=8078 and n=21,940, respectively) and an imbalanced sample (n=12,117 and n=32,910, respectively) were drawn. A 2-parameter logistic model was used to provide scores for each case. Several ML algorithms were used in the demonstration to classify cases based on their health-related features: logistic regression, linear discriminant analysis, K-nearest neighbors, decision tree, naive Bayes, and a neural network. Generalized linear mixed model analyses were used to assess the effects of case difficulty strata, ML algorithm, and the interaction between them in predicting accuracy.
    The results showed significant effects (P<.001) for case difficulty strata, ML algorithm, and their interaction in predicting accuracy and illustrated that all classifiers performed better with easier-to-classify cases and that overall the neural network performed best. Significant interactions suggest that cases that fall in the most arduous strata should be handled by logistic regression, linear discriminant analysis, decision tree, or neural network but not by naive Bayes or K-nearest neighbors. Conventional metrics for ML classification have been reported for methodological comparison.
    This demonstration shows that using the IRT is a viable method for understanding the data that are provided to ML algorithms, independent of outcome measures, and highlights how well classifiers differentiate cases of varying difficulty. This method explains which features are indicative of healthy states and why. It enables end users to tailor the classifier that is appropriate to the difficulty level of the patient for personalized medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    2014-2016年埃博拉疫情期间使用的主流干预措施是接触者追踪和病例隔离。尼日利亚的埃博拉疫情是2014-2016年疫情的一部分,证明了控制干预措施的有效性,住院率达到100%。这里,我们的目标是明确估计病例隔离的保护作用,重建发病和住院的时间事件以及传输网络。我们证明了案例隔离减少了再现次数并缩短了串行间隔。采用贝叶斯推理和马尔可夫链蒙特卡罗方法进行参数估计,并假设再生数随时间呈指数下降,病例隔离的保护效应估计为39.7%(95%可信区间:2.4%-82.1%)。还估计了病例隔离的个体保护作用,表明有效性取决于速度,即从发病到住院的时间。
    The mainstream interventions used during the 2014-2016 Ebola epidemic were contact tracing and case isolation. The Ebola outbreak in Nigeria that formed part of the 2014-2016 epidemic demonstrated the effectiveness of control interventions with a 100% hospitalization rate. Here, we aim to explicitly estimate the protective effect of case isolation, reconstructing the time events of onset of illness and hospitalization as well as the transmission network. We show that case isolation reduced the reproduction number and shortened the serial interval. Employing Bayesian inference with the Markov chain Monte Carlo method for parameter estimation and assuming that the reproduction number exponentially declines over time, the protective effect of case isolation was estimated to be 39.7% (95% credible interval: 2.4%-82.1%). The individual protective effect of case isolation was also estimated, showing that the effectiveness was dependent on the speed, i.e. the time from onset of illness to hospitalization.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:疟疾传播受到包括气候在内的复杂因素相互作用的影响。社会经济,环境因素和干预措施。非洲各地的疟疾控制努力显示出喜忧参半的影响。气候驱动因素可能在气候变化中发挥越来越大的作用。在整个非洲加强基于设施的例行每月疟疾数据收集的努力,为解释负担趋势和监测控制方案进展创造了越来越有价值的数据源。更好地了解疟疾发病率与其他气候和非气候驱动因素在时间和空间上的关系,可能有助于指导和解释干预措施的影响。
    方法:收集2004年至2017年马拉维27个地区的常规每月儿科门诊疟疾病例数据,并结合气候数据进行分析。环境,社会经济和干预因素以及地区级人口估计。利用贝叶斯推断对时空广义线性混合模型进行拟合,为了量化各种危险因素与马拉维临床疟疾发病率地区水平变化的关联强度,并使用地图可视化。
    结果:在2004年至2017年期间报告的儿童临床疟疾病例率略有增加,从每1000人50到53例,全国气候带之间的差异很大。气候和环境因素,包括每月平均气温和降雨异常,标准化差异植物指数(NDVI)和RDT用于诊断显示与疟疾发病率显着相关。与仅在三个月前与疟疾发病率相关的降雨异常不同,当月和前3个月中的每个月的温度与疾病发病率有显着关系。估计的风险图显示,马拉维湖泊和夏尔谷地区的风险相对较高。
    结论:建模方法可以确定马拉维疟疾发病率可能异常高或低的地区,并区分可以通过测量的风险因素和无法解释的剩余空间变异来解释的对风险的贡献。此外,应用于现成常规数据的空间统计方法提供了一种替代信息来源,可以补充政策制定和实施中的调查数据,以指导监测和干预工作。
    BACKGROUND: Malaria transmission is influenced by a complex interplay of factors including climate, socio-economic, environmental factors and interventions. Malaria control efforts across Africa have shown a mixed impact. Climate driven factors may play an increasing role with climate change. Efforts to strengthen routine facility-based monthly malaria data collection across Africa create an increasingly valuable data source to interpret burden trends and monitor control programme progress. A better understanding of the association with other climatic and non-climatic drivers of malaria incidence over time and space may help guide and interpret the impact of interventions.
    METHODS: Routine monthly paediatric outpatient clinical malaria case data were compiled from 27 districts in Malawi between 2004 and 2017, and analysed in combination with data on climatic, environmental, socio-economic and interventional factors and district level population estimates. A spatio-temporal generalized linear mixed model was fitted using Bayesian inference, in order to quantify the strength of association of the various risk factors with district-level variation in clinical malaria rates in Malawi, and visualized using maps.
    RESULTS: Between 2004 and 2017 reported childhood clinical malaria case rates showed a slight increase, from 50 to 53 cases per 1000 population, with considerable variation across the country between climatic zones. Climatic and environmental factors, including average monthly air temperature and rainfall anomalies, normalized difference vegetative index (NDVI) and RDT use for diagnosis showed a significant relationship with malaria incidence. Temperature in the current month and in each of the 3 months prior showed a significant relationship with the disease incidence unlike rainfall anomaly which was associated with malaria incidence at only three months prior. Estimated risk maps show relatively high risk along the lake and Shire valley regions of Malawi.
    CONCLUSIONS: The modelling approach can identify locations likely to have unusually high or low risk of malaria incidence across Malawi, and distinguishes between contributions to risk that can be explained by measured risk-factors and unexplained residual spatial variation. Also, spatial statistical methods applied to readily available routine data provides an alternative information source that can supplement survey data in policy development and implementation to direct surveillance and intervention efforts.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • UNASSIGNED: The first Ebola virus disease (EVD) case in the United States (US) was confirmed September 30, 2014 in a man 45 years old. This event created considerable media attention and there was fear of an EVD outbreak in the US.
    UNASSIGNED: This study examined whether emergency department (ED) visits changed in metropolitan Dallas-Fort Worth--, Texas (DFW) after this EVD case was confirmed. Using Texas Health Services Region 2/3 syndromic surveillance data and focusing on DFW, interrupted time series analyses were conducted using segmented regression models with autoregressive errors for overall ED visits and rates of several chief complaints, including fever with gastrointestinal distress (FGI). Date of fatal case confirmation was the \"event.\"
    UNASSIGNED: Results indicated the event was highly significant for ED visits overall (P<0.05) and for the rate of FGI visits (P<0.0001). An immediate increase in total ED visits of 1,023 visits per day (95% CI: 797.0, 1,252.8) was observed, equivalent to 11.8% (95% CI: 9.2%, 14.4%) increase ED visits overall. Visits and the rate of FGI visits in DFW increased significantly immediately after confirmation of the EVD case and remained elevated for several months even adjusting for seasonality both within symptom specific chief complaints as well as overall.
    UNASSIGNED: These results have implications for ED surge capacity as well as for public health messaging in the wake of a public health emergency.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Case Reports
    OBJECTIVE: Neurotoxicity is a side effect of acyclovir. We report the first case, to our knowledge, whereby Bayesian-informed clearance estimates supported a therapeutic intervention for acyclovir-associated neurotoxicity.
    METHODS: A 62-year-old male with the diagnosis of disseminated zoster was being treated with intravenous (IV) acyclovir when he developed symptoms of acute neurotoxicity. Acyclovir had been dose-adjusted for renal dysfunction according to traditional creatinine clearance estimates; however, as the patient was also on vancomycin, Bayesian estimates of vancomycin clearances were performed, which revealed a 2-fold lower creatinine clearance. In response to the Bayesian estimates, acyclovir was discontinued, and improvements in mentation were noted within 24 hours.
    CONCLUSIONS: Alternate approaches to estimate renal function beyond Cockcroft-Gault, such as a Bayesian approach used in our patient, should be considered when population estimates are likely to be inaccurate and potentially dangerous to the patient.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号