关键词: ASReview Active learning Noisily labeled records Replication Systematic reviews

Mesh : Humans Borderline Personality Disorder Datasets as Topic Machine Learning Meta-Analysis as Topic Systematic Reviews as Topic

来  源:   DOI:10.1186/s13643-024-02472-w   PDF(Pubmed)

Abstract:
Systematic reviews and meta-analyses typically require significant time and effort. Machine learning models have the potential to enhance screening efficiency in these processes. To effectively evaluate such models, fully labeled datasets-detailing all records screened by humans and their labeling decisions-are imperative. This paper presents the creation of a comprehensive dataset for a systematic review of treatments for Borderline Personality Disorder, as reported by Oud et al. (2018) for running a simulation study. The authors adhered to the PRISMA guidelines and published both the search query and the list of included records, but the complete dataset with all labels was not disclosed. We replicated their search and, facing the absence of initial screening data, introduced a Noisy Label Filter (NLF) procedure using active learning to validate noisy labels. Following the NLF application, no further relevant records were found. A simulation study employing the reconstructed dataset demonstrated that active learning could reduce screening time by 82.30% compared to random reading. The paper discusses potential causes for discrepancies, provides recommendations, and introduces a decision tree to assist in reconstructing datasets for the purpose of running simulation studies.
摘要:
系统评价和荟萃分析通常需要大量的时间和精力。机器学习模型有可能提高这些过程中的筛选效率。为了有效地评估此类模型,完全标记的数据集-详细说明人类筛选的所有记录及其标记决定-是必要的。本文提出了一个全面的数据集的创建,用于系统回顾边缘性人格障碍的治疗方法,正如Oud等人报道的那样。(2018)用于运行模拟研究。作者坚持PRISMA指南,并发布了搜索查询和包含的记录列表,但没有披露所有标签的完整数据集。我们复制了他们的搜索,面对缺乏初步筛查数据,引入了噪声标签过滤器(NLF)过程,使用主动学习来验证噪声标签。在NLF申请之后,没有发现进一步的相关记录。使用重建数据集的模拟研究表明,与随机读取相比,主动学习可以将筛选时间减少82.30%。本文讨论了差异的潜在原因,提供建议,并引入决策树来帮助重建数据集,以运行仿真研究。
公众号