背景:为大型多中心研究提供用户友好的电子数据收集工具是获得高质量研究数据的关键。研究电子数据捕获(REDCap)是为建立具有集成图形用户界面的研究数据库而开发的软件解决方案,用于电子数据输入。瑞士母婴HIV队列研究(MoCHiV)是一项纵向队列研究,可追溯到1980年代初,约有200万个数据条目。直到2022年,MoCHiV的数据收集都是基于纸张的。
目的:本研究的目的是为报告MoCHiV数据的医师和研究护士提供电子数据输入的用户友好的图形界面。
方法:MoCHiV收集感染艾滋病毒的妇女和感染艾滋病毒的母亲所生的孩子的产科事件的信息。直到2022年,MoCHiV数据都存储在OracleSQL关系数据库中。在这个项目中,R和REDCap用于开发MoCHiV的电子数据输入平台,并迁移已收集的数据。
结果:为MoCHiV提供电子数据输入选项的关键步骤是(1)设计,(2)数据清理和格式化,(3)迁移和合规,和(4)附加功能。第一步,数据库结构在REDCap中定义,包括主键和外键的规范,研究变量的定义,和问题的层次结构(称为“分支逻辑”)。第二步,存储在Oracle中的数据经过清理和格式化,以符合定义的数据库结构。系统数据检查可确保符合所有分支逻辑和分类变量级别。REDCap特定的变量和用于在REDCap中启用关系数据结构的重复事件的编号是使用R生成的。在第三步中,将数据导入REDCap,然后与原始数据进行系统比较。在最后一步,附加功能,例如数据访问组,重定向,和总结报告,进行整合,以方便多中心MoCHiV研究中的数据输入。
结论:通过组合不同的软件工具-OracleSQL,R,和REDCap-建立一个系统的数据清理管道,格式化,比较,我们能够将多中心纵向队列研究从OracleSQL迁移到REDCap.REDCap提供了一种灵活的方式来开发定制的研究设计,即使在具有不同研究臂的纵向研究的情况下(即,产科事件,女人,和母子对)。然而,REDCap不提供用于在数据导入之前预处理大型数据集的内置工具。需要额外的软件(例如,R)进行数据格式化和清理,实现预定义的REDCap数据结构。
BACKGROUND: Providing user-friendly electronic data collection tools for large multicenter studies is key for obtaining high-quality research data. Research Electronic Data Capture (REDCap) is a software solution developed for setting up research databases with integrated graphical user interfaces for electronic data entry. The Swiss Mother and Child HIV Cohort Study (MoCHiV) is a longitudinal cohort study with around 2 million data entries dating back to the early 1980s. Until 2022, data collection in MoCHiV was paper-based.
OBJECTIVE: The objective of this study was to provide a user-friendly graphical interface for electronic data entry for physicians and study nurses reporting MoCHiV data.
METHODS: MoCHiV collects information on obstetric events among women living with HIV and children born to mothers living with HIV. Until 2022, MoCHiV data were stored in an Oracle SQL relational database. In this project, R and REDCap were used to develop an electronic data entry platform for MoCHiV with migration of already collected data.
RESULTS: The key steps for providing an electronic data entry option for MoCHiV were (1) design, (2) data cleaning and formatting, (3) migration and compliance, and (4) add-on features. In the first step, the database structure was defined in REDCap, including the specification of primary and foreign keys, definition of study variables, and the hierarchy of questions (termed \"branching logic\"). In the second step, data stored in Oracle were cleaned and formatted to adhere to the defined database structure. Systematic data checks ensured compliance to all branching logic and levels of categorical variables. REDCap-specific variables and numbering of repeated events for enabling a relational data structure in REDCap were generated using R. In the third step, data were imported to REDCap and then systematically compared to the original data. In the last step, add-on features, such as data access groups, redirections, and summary reports, were integrated to facilitate data entry in the multicenter MoCHiV study.
CONCLUSIONS: By combining different software tools-Oracle SQL, R, and REDCap-and building a systematic pipeline for data cleaning, formatting, and comparing, we were able to migrate a multicenter longitudinal cohort study from Oracle SQL to REDCap. REDCap offers a flexible way for developing customized study designs, even in the case of longitudinal studies with different study arms (ie, obstetric events, women, and mother-child pairs). However, REDCap does not offer built-in tools for preprocessing large data sets before data import. Additional software is needed (eg, R) for data formatting and cleaning to achieve the predefined REDCap data structure.