Statistical model

  • 文章类型: Case Reports
    Two probabilistic genotyping (PG) programs, STRMix™ and TrueAllele™, were used to assess the strength of the same item of DNA evidence in a federal criminal case, with strikingly different results. For STRMix, the reported likelihood ratio in favor of the non-contributor hypothesis was 24; for TrueAllele it ranged from 1.2 million to 16.7 million, depending on the reference population. This case report seeks to explain why the two programs produced different results and to consider what the difference tells us about the reliability and trustworthiness of these programs. It uses a locus-by-locus breakdown to trace the differing results to subtle differences in modeling parameters and methods, analytic thresholds, and mixture ratios, as well as TrueAllele\'s use of an ad hoc procedure for assigning LRs at some loci. These findings illustrate the extent to which PG analysis rests on a lattice of contestable assumptions, highlighting the importance of rigorous validation of PG programs using known-source test samples that closely replicate the characteristics of evidentiary samples. The article also points out misleading aspects of the way STRMix and TrueAllele results are routinely presented in reports and testimony and calls for clarification of forensic reporting standards to address those problems.






  • 文章类型: Systematic Review
    BACKGROUND: Leprosy is an infectious disease caused by Mycobacterium leprae and remains a source of preventable disability if left undetected. Case detection delay is an important epidemiological indicator for progress in interrupting transmission and preventing disability in a community. However, no standard method exists to effectively analyse and interpret this type of data. In this study, we aim to evaluate the characteristics of leprosy case detection delay data and select an appropriate model for the variability of detection delays based on the best fitting distribution type.
    METHODS: Two sets of leprosy case detection delay data were evaluated: a cohort of 181 patients from the post exposure prophylaxis for leprosy (PEP4LEP) study in high endemic districts of Ethiopia, Mozambique, and Tanzania; and self-reported delays from 87 individuals in 8 low endemic countries collected as part of a systematic literature review. Bayesian models were fit to each dataset to assess which probability distribution (log-normal, gamma or Weibull) best describes variation in observed case detection delays using leave-one-out cross-validation, and to estimate the effects of individual factors.
    RESULTS: For both datasets, detection delays were best described with a log-normal distribution combined with covariates age, sex and leprosy subtype [expected log predictive density (ELPD) for the joint model: -1123.9]. Patients with multibacillary (MB) leprosy experienced longer delays compared to paucibacillary (PB) leprosy, with a relative difference of 1.57 [95% Bayesian credible interval (BCI): 1.14-2.15]. Those in the PEP4LEP cohort had 1.51 (95% BCI: 1.08-2.13) times longer case detection delay compared to the self-reported patient delays in the systematic review.
    CONCLUSIONS: The log-normal model presented here could be used to compare leprosy case detection delay datasets, including PEP4LEP where the primary outcome measure is reduction in case detection delay. We recommend the application of this modelling approach to test different probability distributions and covariate effects in studies with similar outcomes in the field of leprosy and other skin-NTDs.






  • 文章类型: Journal Article
    OBJECTIVE: The main objective of state behavioral risk factor surveillance system (BRFSS) is to produce reliable state-level estimates of various population health outcomes. A multilevel Regression and Post-stratification (MRP) methodology for small area estimation has been applied to the 500 Cities Project to provide population estimates at both city-level and census tract-level using national BRFSS data. To date, MRP has not been applied to any state BRFSS to produce health data at local geographic areas. In addition, the use of single year BRFSS might produce temporary inconsistency in small area estimates (SAEs). The predicted standard errors (SEs) and confidence intervals (CIs) of SAEs using Monte Carlo simulation could be substantially underestimated or overestimated.
    METHODS: By extending the current MRP approach and applying a parametric bootstrapping approach to Connecticut BRFSS (CT BRFSS), we were able to produce SAEs as well as SEs and CIs of SAEs for Connecticut counties and towns. We also applied this model to 5-year CT BRFSS (2011-2015) with an aim to improve the temporary consistency of SAEs.
    RESULTS: Both single-year and 5-year estimates with SEs and CIs were generated for six selected population health indicators at town, county and state levels. Model-based SAEs were internally evaluated by comparing to single-year and 5-year direct BRFSS survey (2011-2015). SAEs were also externally validated when external data were available.
    CONCLUSIONS: Model-based SAEs are valid and could be used to characterize local geographic variations using single state BRFSS data.






  • 文章类型: Journal Article
    Real-time prediction of surgical duration can inform perioperative decisions and reduce surgical costs. We developed a machine learning approach that continuously incorporates preoperative and intraoperative information for forecasting surgical duration.
    Preoperative (e.g. procedure name) and intraoperative (e.g. medications and vital signs) variables were retrieved from anaesthetic records of surgeries performed between March 1, 2019 and October 31, 2019. A modular artificial neural network was developed and compared with a Bayesian approach and the scheduled surgical duration. Continuous ranked probability score (CRPS) was used as a measure of time error to assess model accuracy. For evaluating clinical performance, accuracy for each approach was assessed in identifying cases that ran beyond 15:00 (commonly scheduled end of shift), thus identifying opportunities to avoid overtime labour costs.
    The analysis included 70 826 cases performed at eight hospitals. The modular artificial neural network had the lowest time error (CRPS: mean=13.8; standard deviation=35.4 min), which was significantly better (mean difference=6.4 min [95% confidence interval: 6.3-6.5]; P<0.001) than the Bayesian approach. The modular artificial neural network also had the highest accuracy in identifying operating theatres that would overrun 15:00 (accuracy at 1 h prior=89%) compared with the Bayesian approach (80%) and a naïve approach using the scheduled duration (78%).
    A real-time neural network model using preoperative and intraoperative data had significantly better performance than a Bayesian approach or scheduled duration, offering opportunities to avoid overtime labour costs and reduce the cost of surgery by providing superior real-time information for perioperative decision support.






  • 文章类型: Journal Article
    Supervised machine learning (ML) is being featured in the health care literature with study results frequently reported using metrics such as accuracy, sensitivity, specificity, recall, or F1 score. Although each metric provides a different perspective on the performance, they remain to be overall measures for the whole sample, discounting the uniqueness of each case or patient. Intuitively, we know that all cases are not equal, but the present evaluative approaches do not take case difficulty into account.
    A more case-based, comprehensive approach is warranted to assess supervised ML outcomes and forms the rationale for this study. This study aims to demonstrate how the item response theory (IRT) can be used to stratify the data based on how difficult each case is to classify, independent of the outcome measure of interest (eg, accuracy). This stratification allows the evaluation of ML classifiers to take the form of a distribution rather than a single scalar value.
    Two large, public intensive care unit data sets, Medical Information Mart for Intensive Care III and electronic intensive care unit, were used to showcase this method in predicting mortality. For each data set, a balanced sample (n=8078 and n=21,940, respectively) and an imbalanced sample (n=12,117 and n=32,910, respectively) were drawn. A 2-parameter logistic model was used to provide scores for each case. Several ML algorithms were used in the demonstration to classify cases based on their health-related features: logistic regression, linear discriminant analysis, K-nearest neighbors, decision tree, naive Bayes, and a neural network. Generalized linear mixed model analyses were used to assess the effects of case difficulty strata, ML algorithm, and the interaction between them in predicting accuracy.
    The results showed significant effects (P<.001) for case difficulty strata, ML algorithm, and their interaction in predicting accuracy and illustrated that all classifiers performed better with easier-to-classify cases and that overall the neural network performed best. Significant interactions suggest that cases that fall in the most arduous strata should be handled by logistic regression, linear discriminant analysis, decision tree, or neural network but not by naive Bayes or K-nearest neighbors. Conventional metrics for ML classification have been reported for methodological comparison.
    This demonstration shows that using the IRT is a viable method for understanding the data that are provided to ML algorithms, independent of outcome measures, and highlights how well classifiers differentiate cases of varying difficulty. This method explains which features are indicative of healthy states and why. It enables end users to tailor the classifier that is appropriate to the difficulty level of the patient for personalized medicine.






  • 文章类型: Journal Article
    The mainstream interventions used during the 2014-2016 Ebola epidemic were contact tracing and case isolation. The Ebola outbreak in Nigeria that formed part of the 2014-2016 epidemic demonstrated the effectiveness of control interventions with a 100% hospitalization rate. Here, we aim to explicitly estimate the protective effect of case isolation, reconstructing the time events of onset of illness and hospitalization as well as the transmission network. We show that case isolation reduced the reproduction number and shortened the serial interval. Employing Bayesian inference with the Markov chain Monte Carlo method for parameter estimation and assuming that the reproduction number exponentially declines over time, the protective effect of case isolation was estimated to be 39.7% (95% credible interval: 2.4%-82.1%). The individual protective effect of case isolation was also estimated, showing that the effectiveness was dependent on the speed, i.e. the time from onset of illness to hospitalization.







  • 文章类型: Journal Article
    BACKGROUND: Malaria transmission is influenced by a complex interplay of factors including climate, socio-economic, environmental factors and interventions. Malaria control efforts across Africa have shown a mixed impact. Climate driven factors may play an increasing role with climate change. Efforts to strengthen routine facility-based monthly malaria data collection across Africa create an increasingly valuable data source to interpret burden trends and monitor control programme progress. A better understanding of the association with other climatic and non-climatic drivers of malaria incidence over time and space may help guide and interpret the impact of interventions.
    METHODS: Routine monthly paediatric outpatient clinical malaria case data were compiled from 27 districts in Malawi between 2004 and 2017, and analysed in combination with data on climatic, environmental, socio-economic and interventional factors and district level population estimates. A spatio-temporal generalized linear mixed model was fitted using Bayesian inference, in order to quantify the strength of association of the various risk factors with district-level variation in clinical malaria rates in Malawi, and visualized using maps.
    RESULTS: Between 2004 and 2017 reported childhood clinical malaria case rates showed a slight increase, from 50 to 53 cases per 1000 population, with considerable variation across the country between climatic zones. Climatic and environmental factors, including average monthly air temperature and rainfall anomalies, normalized difference vegetative index (NDVI) and RDT use for diagnosis showed a significant relationship with malaria incidence. Temperature in the current month and in each of the 3 months prior showed a significant relationship with the disease incidence unlike rainfall anomaly which was associated with malaria incidence at only three months prior. Estimated risk maps show relatively high risk along the lake and Shire valley regions of Malawi.
    CONCLUSIONS: The modelling approach can identify locations likely to have unusually high or low risk of malaria incidence across Malawi, and distinguishes between contributions to risk that can be explained by measured risk-factors and unexplained residual spatial variation. Also, spatial statistical methods applied to readily available routine data provides an alternative information source that can supplement survey data in policy development and implementation to direct surveillance and intervention efforts.







  • UNASSIGNED: The first Ebola virus disease (EVD) case in the United States (US) was confirmed September 30, 2014 in a man 45 years old. This event created considerable media attention and there was fear of an EVD outbreak in the US.
    UNASSIGNED: This study examined whether emergency department (ED) visits changed in metropolitan Dallas-Fort Worth--, Texas (DFW) after this EVD case was confirmed. Using Texas Health Services Region 2/3 syndromic surveillance data and focusing on DFW, interrupted time series analyses were conducted using segmented regression models with autoregressive errors for overall ED visits and rates of several chief complaints, including fever with gastrointestinal distress (FGI). Date of fatal case confirmation was the \"event.\"
    UNASSIGNED: Results indicated the event was highly significant for ED visits overall (P<0.05) and for the rate of FGI visits (P<0.0001). An immediate increase in total ED visits of 1,023 visits per day (95% CI: 797.0, 1,252.8) was observed, equivalent to 11.8% (95% CI: 9.2%, 14.4%) increase ED visits overall. Visits and the rate of FGI visits in DFW increased significantly immediately after confirmation of the EVD case and remained elevated for several months even adjusting for seasonality both within symptom specific chief complaints as well as overall.
    UNASSIGNED: These results have implications for ED surge capacity as well as for public health messaging in the wake of a public health emergency.






  • 文章类型: Case Reports
    OBJECTIVE: Neurotoxicity is a side effect of acyclovir. We report the first case, to our knowledge, whereby Bayesian-informed clearance estimates supported a therapeutic intervention for acyclovir-associated neurotoxicity.
    METHODS: A 62-year-old male with the diagnosis of disseminated zoster was being treated with intravenous (IV) acyclovir when he developed symptoms of acute neurotoxicity. Acyclovir had been dose-adjusted for renal dysfunction according to traditional creatinine clearance estimates; however, as the patient was also on vancomycin, Bayesian estimates of vancomycin clearances were performed, which revealed a 2-fold lower creatinine clearance. In response to the Bayesian estimates, acyclovir was discontinued, and improvements in mentation were noted within 24 hours.
    CONCLUSIONS: Alternate approaches to estimate renal function beyond Cockcroft-Gault, such as a Bayesian approach used in our patient, should be considered when population estimates are likely to be inaccurate and potentially dangerous to the patient.





