评估和改善基于网络的调查中的数据完整性： COVID - 19 研究中欺诈检测系统的比较。Assessing and Improving Data Integrity in Web-Based Surveys: Comparison of Fraud Detection Systems in a COVID-19 Study.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

BACKGROUND: Web-based surveys increase access to study participation and improve opportunities to reach diverse populations. However, web-based surveys are vulnerable to data quality threats, including fraudulent entries from automated bots and duplicative submissions. Widely used proprietary tools to identify fraud offer little transparency about the methods used, effectiveness, or representativeness of resulting data sets. Robust, reproducible, and context-specific methods of accurately detecting fraudulent responses are needed to ensure integrity and maximize the value of web-based survey research.
OBJECTIVE: This study aims to describe a multilayered fraud detection system implemented in a large web-based survey about COVID-19 attitudes, beliefs, and behaviors; examine the agreement between this fraud detection system and a proprietary fraud detection system; and compare the resulting study samples from each of the 2 fraud detection methods.
METHODS: The PhillyCEAL Common Survey is a cross-sectional web-based survey that remotely enrolled residents ages 13 years and older to assess how the COVID-19 pandemic impacted individuals, neighborhoods, and communities in Philadelphia, Pennsylvania. Two fraud detection methods are described and compared: (1) a multilayer fraud detection strategy developed by the research team that combined automated validation of response data and real-time verification of study entries by study personnel and (2) the proprietary fraud detection system used by the Qualtrics (Qualtrics) survey platform. Descriptive statistics were computed for the full sample and for responses classified as valid by 2 different fraud detection methods, and classification tables were created to assess agreement between the methods. The impact of fraud detection methods on the distribution of vaccine confidence by racial or ethnic group was assessed.
RESULTS: Of 7950 completed surveys, our multilayer fraud detection system identified 3228 (40.60%) cases as valid, while the Qualtrics fraud detection system identified 4389 (55.21%) cases as valid. The 2 methods showed only \"fair\" or \"minimal\" agreement in their classifications (κ=0.25; 95% CI 0.23-0.27). The choice of fraud detection method impacted the distribution of vaccine confidence by racial or ethnic group.
CONCLUSIONS: The selection of a fraud detection method can affect the study\'s sample composition. The findings of this study, while not conclusive, suggest that a multilayered approach to fraud detection that includes conservative use of automated fraud detection and integration of human review of entries tailored to the study\'s specific context and its participants may be warranted for future survey research.

摘要：

背景：基于网络的调查增加了参与研究的机会，并改善了接触不同人群的机会。然而,基于网络的调查容易受到数据质量威胁，包括来自自动机器人的欺诈性条目和重复提交。广泛使用的专有工具来识别欺诈行为，对所使用的方法几乎没有透明度，有效性,或结果数据集的代表性。健壮,可重复,并且需要准确检测欺诈性响应的特定环境方法，以确保完整性并最大限度地发挥基于网络的调查研究的价值。
目的：本研究旨在描述在一项关于COVID-19态度的大型网络调查中实施的多层欺诈检测系统，信仰,和行为；检查此欺诈检测系统与专有欺诈检测系统之间的协议；并比较2种欺诈检测方法中每种方法的结果研究样本。
方法：PhillyCEAL共同调查是一项基于网络的横断面调查，该调查远程登记了13岁及以上的居民，以评估COVID-19大流行如何影响个人，邻里,和费城的社区，宾夕法尼亚。描述并比较了两种欺诈检测方法：（1）研究团队开发的多层欺诈检测策略，该策略结合了响应数据的自动验证和研究人员对研究条目的实时验证；（2）Qualtrics（Qualtrics）调查平台使用的专有欺诈检测系统。为完整样本和通过2种不同的欺诈检测方法分类为有效的响应计算描述性统计数据，并创建分类表以评估方法之间的一致性。评估了欺诈检测方法对按种族或族裔群体分布的疫苗信心的影响。
结果：完成的7950项调查，我们的多层欺诈检测系统确定3228例(40.60%)有效，而Qualtrics欺诈检测系统确定4389(55.21%)例有效。这两种方法在分类中仅显示出“公平”或“最小”的一致性(κ=0.25；95%CI0.23-0.27)。欺诈检测方法的选择影响了按种族或族裔群体划分的疫苗信心分布。
结论：欺诈检测方法的选择会影响研究的样本组成。这项研究的结果，虽然没有定论，建议采取一种多层的欺诈检测方法，包括保守地使用自动欺诈检测，并根据研究的特定背景及其参与者对条目进行人工审查，这可能是未来调查研究的必要条件。