关键词: Coding constraint DNA coding DNA storage Frequency matrix game graph High-quality coding

Mesh : Algorithms DNA / genetics chemistry Sequence Analysis, DNA / methods Information Storage and Retrieval Carbon

来  源:   DOI:10.1016/j.compbiomed.2022.106269

Abstract:
Using complex biomolecules for storage is a new carbon-based storage method. For example, DNA has the potential to be a good method for archival long-term data storage. Reasonable and efficient coding is the first and most important step in DNA storage. However, current coding methods, such as altruism algorithm, have the problem of low coding efficiency and high complexity, and coding constraints and sets make it difficult to see the coding results visually. In this study, a new DNA storage coding method based on frequency matrix game graph (FMG) is proposed to generate DNA storage coding satisfying combinatorial constraints. Compared with the randomness of the heuristic algorithm that satisfies the constraints, the coding method based on the FMG is deterministic and can clearly explain the coding process. In addition, the constraints and coding results have observable characteristics and are better than the previously published results for the size of the coding set. For example, when length of the code n = 10, hamming distance d = 4, the results obtained by proposed approach combining chaos game and graph are 24% better than the previous results. The proposed coding scheme successfully constructs high-quality coding sets with less complexity, which effectively promotes the development of carbon-based storage coding.
摘要:
使用复杂的生物分子进行存储是一种新的基于碳的存储方法。例如,DNA有可能成为档案长期数据存储的好方法。合理有效的编码是DNA存储的第一步,也是最重要的一步。然而,当前的编码方法,比如利他主义算法,存在编码效率低、复杂度高的问题,和编码约束和集合使得很难直观地看到编码结果。在这项研究中,提出了一种新的基于频率矩阵博弈图(FMG)的DNA存储编码方法,以生成满足组合约束的DNA存储编码。与满足约束条件的启发式算法的随机性相比,基于FMG的编码方法是确定性的,可以清楚地解释编码过程。此外,约束和编码结果具有可观察的特征,并且对于编码集的大小,比先前发布的结果更好。例如,当代码长度n=10,汉明距离d=4时,将混沌博弈与图相结合的方法获得的结果比以前的结果好24%。所提出的编码方案成功地构建了高质量的编码集,复杂度较低,有效促进了碳基存储编码的发展。
公众号