背景:近年来,一系列新的智能手机衍生的关于人类流动的数据流已经变得几乎实时的基础上可用。这些数据已被使用,例如,进行交通预测和流行病建模。特别是在COVID-19大流行期间,人类旅行行为被认为是流行病学建模的关键组成部分,以提供有关大流行输入和传播途径的更可靠的估计,或识别热点。然而,几乎在文学中普遍存在,这些数据的代表性,它们与潜在的现实世界人类流动性有什么关系,被忽视了。数据与现实之间的这种脱节对于处于社会不利地位的少数群体尤其重要。
目的:本研究的目的是说明人类流动性数据的非代表性,以及这种非代表性对流行病动态建模的影响。这项研究系统地评估了现实世界的旅行流量与基于人口普查的估计有何不同,特别是在社会弱势的少数群体的情况下,比如老年人和女性,并进一步衡量流行病学研究中这种差异带来的偏见。
方法:为了了解人口流动的人口构成,收集了2020年1月1日至2月29日中国3.18亿手机用户的全国移动数据。具体来说,我们根据人口普查数据量化了实际移民和居民组成之间的人口组成差异,并通过构建年龄结构化的COVID-19传播SEIR(易感暴露感染-恢复)模型,展示了这种非代表性如何影响流行病学模型。
结果:我们发现旅行人群和总人口之间的人口统计学组成存在显着差异。在人口流动中,59%(n=20,067,526)的旅行者是年轻人,36%(n=12,210,565)的旅行者是中年人(P<.001),这与中国整体的成年人口构成(其中36%的人是年轻人,其中40%是中年人)完全不同。这种差异将在流行病学研究中引入惊人的偏见:对最大每日感染的估计相差近3倍,高峰时间有46天的差距。
结论:实际迁移和居民组成之间的差异强烈影响流行病学预测的结果,通常假定流表示基础人口统计学。我们的发现暗示,有必要测量和量化与非代表性相关的固有偏见,以进行准确的流行病学监测和预测。
BACKGROUND: In recent years, a range of novel smartphone-derived data streams about human mobility have become available on a near-real-time basis. These data have been used, for example, to perform traffic forecasting and epidemic modeling. During the COVID-19 pandemic in particular, human travel behavior has been considered a key component of epidemiological modeling to provide more reliable estimates about the volumes of the pandemic\'s importation and transmission routes, or to identify hot spots. However, nearly universally in the literature, the representativeness of these data, how they relate to the underlying real-world human mobility, has been overlooked. This disconnect between data and reality is especially relevant in the case of socially disadvantaged minorities.
OBJECTIVE: The objective of this study is to illustrate the nonrepresentativeness of data on human mobility and the impact of this nonrepresentativeness on modeling dynamics of the epidemic. This study systematically evaluates how real-world travel flows differ from census-based estimations, especially in the case of socially disadvantaged minorities, such as older adults and women, and further measures biases introduced by this difference in epidemiological studies.
METHODS: To understand the demographic composition of population movements, a nationwide mobility data set from 318 million mobile phone users in China from January 1 to February 29, 2020, was curated. Specifically, we quantified the disparity in the population composition between actual migrations and resident composition according to census data, and shows how this nonrepresentativeness impacts epidemiological modeling by constructing an age-structured SEIR (Susceptible-Exposed-Infected- Recovered) model of COVID-19 transmission.
RESULTS: We found a significant difference in the demographic composition between those who travel and the overall population. In the population flows, 59% (n=20,067,526) of travelers are young and 36% (n=12,210,565) of them are middle-aged (P<.001), which is completely different from the overall adult population composition of China (where 36% of individuals are young and 40% of them are middle-aged). This difference would introduce a striking bias in epidemiological studies: the estimation of maximum daily infections differs nearly 3 times, and the peak time has a large gap of 46 days.
CONCLUSIONS: The difference between actual migrations and resident composition strongly impacts outcomes of epidemiological forecasts, which typically assume that flows represent underlying demographics. Our findings imply that it is necessary to measure and quantify the inherent biases related to nonrepresentativeness for accurate epidemiological surveillance and forecasting.