人们应该假设系统生物学中的计算机模拟实验比它们的湿实验室对应物更不容易受到可重复性问题的影响。因为它们没有自然的生物变异,它们的环境可以完全控制。然而,最近的研究表明,只有一半的已发表的生物系统的数学模型可以复制没有实质性的努力。在本文中,我们以房室结的一维数学模型为例,研究了复制失败或繁琐的潜在原因,我们花了四个月的时间来繁殖。该模型表明,即使是严格的研究,由于缺少信息,也很难重现。方程和参数中的错误,缺乏可用的数据文件,不可执行代码,缺少或不完整的实验方案,缺少方程式背后的基本原理。这些问题中的许多似乎与软件工程中使用单元测试等技术解决的问题相似,回归测试,持续集成,版本控制,档案服务,和一个全面的模块化设计与广泛的文档。应用这些技术,我们使用建模语言Modelica重新实现被检查的模型。生成的工作流程与模型无关,可以转换为SBML,CellML,和其他语言。它通过在物理上与开发环境分离的服务器上的虚拟机中执行自动测试来保证方法的可重复性。此外,它有助于结果的重现性,因为模型更易于理解,并且因为完整的模型代码,实验协议,和仿真数据已发布,并且可以在本文中使用的确切版本中进行访问。我们发现额外的设计和文档工作是合理的,即使只是考虑开发过程中的直接好处,如更容易和更快的调试,增加方程的可理解性,并减少了从文献中查找细节的要求。
One should assume that in silico experiments in systems biology are less susceptible to reproducibility issues than their wet-lab counterparts, because they are free from natural biological variations and their environment can be fully controlled. However, recent studies show that only half of the published mathematical models of biological systems can be reproduced without substantial effort. In this article we examine the potential causes for failed or cumbersome reproductions in a
case study of a one-dimensional mathematical model of the atrioventricular node, which took us four months to reproduce. The model demonstrates that even otherwise rigorous studies can be hard to reproduce due to missing information, errors in equations and parameters, a lack in available data files, non-executable code, missing or incomplete experiment protocols, and missing rationales behind equations. Many of these issues seem similar to problems that have been solved in software engineering using techniques such as unit testing, regression tests, continuous integration, version control, archival services, and a thorough modular design with extensive documentation. Applying these techniques, we reimplement the examined model using the modeling language Modelica. The resulting workflow is independent of the model and can be translated to SBML, CellML, and other languages. It guarantees methods reproducibility by executing automated tests in a virtual machine on a server that is physically separated from the development environment. Additionally, it facilitates results reproducibility, because the model is more understandable and because the complete model code, experiment protocols, and simulation data are published and can be accessed in the exact version that was used in this article. We found the additional design and documentation effort well justified, even just considering the immediate benefits during development such as easier and faster debugging, increased understandability of equations, and a reduced requirement for looking up details from the literature.