Mesh : Crowdsourcing Biodiversity Cities Social Environment Boston

来  源:   DOI:10.1371/journal.pone.0277223

Abstract:
Credibly estimating social-ecological relationships requires data with broad coverage and fine geographic resolutions that are not typically available from standard ecological surveys. Open and unstructured data from crowdsourced platforms offer an opportunity for collecting large quantities of user-submitted ecological data. However, the representativeness of the areas sampled by these data portals is not well known. We investigate how data availability in eBird, one of the largest and most popular crowdsourced science platforms, correlates with race and income of census tracts in two cities: Boston, MA and Phoenix, AZ. We find that checklist submissions vary greatly across census tracts, with similar patterns within both metropolitan regions. In particular, census tracts with high income and high proportions of white residents are most likely to be represented in the data in both cities, which indicates selection bias in eBird coverage. Our results illustrate the non-representativeness of eBird data, and they also raise deeper questions about the validity of statistical inferences regarding disparities that can be drawn from such datasets. We discuss these challenges and illustrate how sample selection problems in unstructured or semi-structured crowdsourced data can lead to spurious conclusions regarding the relationships between race, income, and access to urban bird biodiversity. While crowdsourced data are indispensable and complementary to more traditional approaches for collecting ecological data, we conclude that unstructured or semi-structured data may not be well-suited for all lines of inquiry, particularly those requiring consistent data coverage, and should thus be handled with appropriate care.
摘要:
可靠地估计社会生态关系需要具有广泛覆盖范围和精细地理分辨率的数据,而这些数据通常无法从标准生态调查中获得。来自众包平台的开放和非结构化数据为收集大量用户提交的生态数据提供了机会。然而,这些数据门户采样区域的代表性并不为人所知。我们调查了eBird中的数据可用性,最大和最受欢迎的众包科学平台之一,与两个城市人口普查区的种族和收入相关:波士顿,马和凤凰城,AZ.我们发现,不同的人口普查区域提交的检查表差异很大,在两个大都市地区都有相似的模式。特别是,两个城市的数据中最有可能代表高收入和白人人口比例高的人口普查区域,这表明了eBird覆盖率的选择偏差。我们的结果说明了eBird数据的非代表性,他们还提出了更深层次的问题,即关于可以从这些数据集中得出的差异的统计推断的有效性。我们讨论了这些挑战,并说明了非结构化或半结构化众包数据中的样本选择问题如何导致有关种族之间关系的虚假结论,收入,以及城市鸟类生物多样性的获取。虽然众包数据是不可或缺的,并且是收集生态数据的更传统方法的补充,我们得出的结论是,非结构化或半结构化数据可能不适用于所有查询线,特别是那些需要一致的数据覆盖,因此,应该适当小心处理。
公众号