DNA storage

  • 文章类型: Journal Article
    随着社会进程的信息化,相关数据量大大增加,传统的存储介质无法满足当前对数据存储的要求。由于其具有高存储容量和持久性的优点,脱氧核糖核酸(DNA)被认为是解决数据存储问题的最具前景的存储介质。合成是DNA储存的重要过程,低质量的DNA编码会增加测序过程中的错误,这可能会影响存储效率。为了减少DNA序列在储存过程中稳定性差引起的错误,本文提出了一种利用双匹配和错误配对约束来提高DNA编码集质量的方法。首先,定义了双匹配和错误配对约束,以解决解决方案中具有自互补反应的序列问题,这些序列在3'端容易发生错配。此外,在算术优化算法中引入了两种策略,包括基本函数的随机扰动和双重自适应加权策略。提出了一种改进的算术优化算法(IAOA)来构造DNA编码集。IAOA在13个基准函数上的实验结果表明,其探索和开发能力比现有算法有了显着提高。此外,IAOA在传统和新的约束下用于DNA编码设计。测试DNA编码组以估计它们关于发夹数量和解链温度的质量。与现有算法相比,本研究中构建的DNA存储编码集在下边界处提高了77.7%。存储集中的DNA序列显示解链温度方差减少9.7-84.1%,发夹结构比降低2.1-80%。结果表明,与传统约束相比,在提出的两种约束下,DNA编码集的稳定性得到了提高。
    With the informationization of social processes, the amount of related data has greatly increased, making traditional storage media unable to meet the current requirements for data storage. Due to its advantages of a high storage capacity and persistence, deoxyribonucleic acid (DNA) has been considered the most prospective storage media to solve the data storage problem. Synthesis is an important process for DNA storage, and low-quality DNA coding can increase errors during sequencing, which can affect the storage efficiency. To reduce errors caused by the poor stability of DNA sequences during storage, this paper proposes a method that uses the double-matching and error-pairing constraints to improve the quality of the DNA coding set. First, the double-matching and error-pairing constraints are defined to solve problems of sequences with self-complementary reactions in the solution that are prone to mismatch at the 3\' end. In addition, two strategies are introduced in the arithmetic optimization algorithm, including a random perturbation of the elementary function and a double adaptive weighting strategy. An improved arithmetic optimization algorithm (IAOA) is proposed to construct DNA coding sets. The experimental results of the IAOA on 13 benchmark functions show a significant improvement in its exploration and development capabilities over the existing algorithms. Moreover, the IAOA is used in the DNA encoding design under both traditional and new constraints. The DNA coding sets are tested to estimate their quality regarding the number of hairpins and melting temperature. The DNA storage coding sets constructed in this study are improved by 77.7% at the lower boundary compared to existing algorithms. The DNA sequences in the storage sets show a reduction of 9.7-84.1% in the melting temperature variance, and the hairpin structure ratio is reduced by 2.1-80%. The results indicate that the stability of the DNA coding sets is improved under the two proposed constraints compared to traditional constraints.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    同步(插入-删除)错误仍然是DNA存储中可靠信息检索的主要挑战。与在存储信息中添加冗余的传统纠错码(ECC)不同,多序列比对(MSA)通过搜索保守子序列解决了这个问题。在本文中,我们对典型MSA算法的纠错能力进行了全面的仿真研究,MAFFT。我们的结果表明,当误差约为20%时,其能力表现出相变。低于这个临界值,增加测序深度最终可以使其接近完全恢复。否则,其性能处于一些较差的水平。给定合理的测序深度(≤70),MSA可以在低错误状态下实现完全恢复,并有效纠正了中错误制度中90%的错误。此外,MSA对于不完美的聚类是稳健的。它也可以与其他手段相结合,如ECC,重复标记,或任何其他代码约束。此外,通过选择合适的测序深度,这种策略可以在成本和阅读速度之间实现最佳权衡。MSA可能是未来DNA存储的竞争性替代品。
    Synchronization (insertions-deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号