关键词: Epidemiology Gaussian process dengue fever generalized linear (autoregressive) model heteroskedastic modeling latent variable

来  源:   DOI:10.1214/17-aoas1090   PDF(Pubmed)

Abstract:
In 2015 the US federal government sponsored a dengue forecasting competition using historical case data from Iquitos, Peru and San Juan, Puerto Rico. Competitors were evaluated on several aspects of out-of-sample forecasts including the targets of peak week, peak incidence during that week, and total season incidence across each of several seasons. our team was one of the winners of that competition, outperforming other teams in multiple targets/locales. In this paper we report on our methodology, a large component of which, surprisingly, ignores the known biology of epidemics at large-for example, relationships between dengue transmission and environmental factors-and instead relies on flexible nonparametric nonlinear Gaussian process (GP) regression fits that \"memorize\" the trajectories of past seasons, and then \"match\" the dynamics of the unfolding season to past ones in real-time. Our phenomenological approach has advantages in situations where disease dynamics are less well understood, or where measurements and forecasts of ancillary covariates like precipitation are unavailable, and/or where the strength of association with cases are as yet unknown. In particular, we show that the GP approach generally outperforms a more classical generalized linear (autoregressive) model (GLM) that we developed to utilize abundant covariate information. We illustrate variations of our method(s) on the two benchmark locales alongside a full summary of results submitted by other contest competitors.
摘要:
2015年,美国联邦政府利用伊基托斯的历史案例数据赞助了一场登革热预测竞赛,秘鲁和圣胡安,波多黎各。竞争对手在样本外预测的几个方面进行了评估,包括高峰周的目标,那一周的峰值发病率,以及几个季节中每个季节的总季节发生率。我们队是比赛的冠军之一,在多个目标/区域设置中表现优于其他团队。在本文中,我们报告了我们的方法论,其中的一个很大的组成部分,令人惊讶的是,忽略了已知的流行病生物学,例如,登革热传播与环境因素之间的关系-而是依赖于灵活的非参数非线性高斯过程(GP)回归拟合,“记忆”过去季节的轨迹,然后“匹配”正在展开的季节的动态与过去的实时。我们的现象学方法在疾病动力学不太了解的情况下具有优势,或者在辅助协变量如降水的测量和预测不可用的地方,和/或与案件的关联强度尚不清楚。特别是,我们表明,GP方法通常优于更经典的广义线性(自回归)模型(GLM),我们开发利用丰富的协变量信息。我们在两个基准区域中说明了我们方法的变体,以及其他竞赛竞争对手提交的结果的完整摘要。
公众号