关键词: Accessibility Code sharing Data sharing Open-access Open-source Reproducibility Transparency

来  源:   DOI:10.7717/peerj-cs.2066   PDF(Pubmed)

Abstract:
Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared. We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten articles organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability. Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses. In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research. By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.
摘要:
数据驱动的计算分析在生物医学研究中变得越来越重要,随着产生的数据量持续增长。然而,缺乏分享研究成果的做法,比如数据,源代码和方法,影响研究的透明度和可重复性,这对科学的进步至关重要。许多已发表的研究由于文献不足而无法重现,代码,和数据被共享。我们对2016-2021年间发表的453份手稿进行了全面分析,发现其中50.1%的手稿未能分享分析代码。即使在那些公开他们代码的人中,绝大多数未能提供额外的研究成果,比如数据。此外,只有十分之一的文章以结构化和可复制的方式组织了他们的代码。我们发现代码可用性语句的存在与代码可用性的提高之间存在显著关联。此外,与进行主要分析的研究相比,进行次要分析的研究中更多的研究倾向于分享其代码。根据我们的发现,我们建议提高对代码共享实践的认识,并立即采取措施提高代码的可用性,以提高生物医学研究的可重复性。通过增加透明度和再现性,我们可以提高科学的严谨性,鼓励合作,加速科学发现。我们必须优先考虑开放的科学实践,包括共享代码,数据,和其他研究产品,以确保生物医学研究可以被科学界的其他人复制和发展。
公众号