DNA storage

  • 文章类型: Journal Article
    随着数字数据的指数级增长,迫切需要创新的存储介质和技术。DNA分子,由于其稳定性,存储容量,和密度,为信息存储提供了一个有前途的解决方案。然而,DNA存储也面临许多挑战,如复杂的生化约束和编码效率。本文介绍了资源管理器,一种基于DeBruijn图的高效DNA编码算法,利用其表征局部序列的能力。Explorer可以在各种生化约束下进行编码,如均聚物,GC含量,和不想要的图案。本文还介绍了Codeformer,一种基于变压器结构的快速解码算法,进一步提高解码效率。数值实验表明,与其他高级算法相比,Explorer不仅在各种生化约束下实现了稳定的编码和解码,而且还将编码效率和比特率提高了10%。此外,编解码器证明了有效解码大量DNA序列的能力。在不同的参数设置下,它的解码效率比传统算法高出两倍多。当编码器与Reed-Solomon码结合使用时,它的解码精度超过99%,使其成为高速解码应用的良好选择。预计这些进步将有助于基于DNA的数据存储系统的开发以及对DNA作为新型信息存储介质的更广泛的探索。
    With the exponential growth of digital data, there is a pressing need for innovative storage media and techniques. DNA molecules, due to their stability, storage capacity, and density, offer a promising solution for information storage. However, DNA storage also faces numerous challenges, such as complex biochemical constraints and encoding efficiency. This paper presents Explorer, a high-efficiency DNA coding algorithm based on the De Bruijn graph, which leverages its capability to characterize local sequences. Explorer enables coding under various biochemical constraints, such as homopolymers, GC content, and undesired motifs. This paper also introduces Codeformer, a fast decoding algorithm based on the transformer architecture, to further enhance decoding efficiency. Numerical experiments indicate that, compared with other advanced algorithms, Explorer not only achieves stable encoding and decoding under various biochemical constraints but also increases the encoding efficiency and bit rate by ¿10%. Additionally, Codeformer demonstrates the ability to efficiently decode large quantities of DNA sequences. Under different parameter settings, its decoding efficiency exceeds that of traditional algorithms by more than two-fold. When Codeformer is combined with Reed-Solomon code, its decoding accuracy exceeds 99%, making it a good choice for high-speed decoding applications. These advancements are expected to contribute to the development of DNA-based data storage systems and the broader exploration of DNA as a novel information storage medium.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    聚合酶链反应(PCR)扩增广泛用于从DNA存储中检索信息。在PCR扩增过程中,引物的3'末端和DNA序列之间的非特异性配对可以在扩增反应中引起串扰,导致干扰序列的产生和降低的扩增精度。为了解决这个问题,提出了一种高效的PCR扩增信息检索编码算法(ECA-PCRAIR)。该算法采用可变长度扫描和修剪优化来构造码本,该码本在满足传统生物学约束的同时最大化存储密度。随后,基于引物库构建码字搜索树以优化码本,可变长度交织器用于约束检测和校正,从而最大限度地减少非特异性配对的可能性。实验结果表明,ECA-PCRAIR可以将引物3'末端与DNA序列之间的非特异性配对概率降低到2-25%,增强DNA序列的鲁棒性。此外,ECA-PCRAIR的存储密度为每个核苷酸2.14-3.67位(位/nt),显著提高存储容量。
    Polymerase Chain Reaction (PCR) amplification is widely used for retrieving information from DNA storage. During the PCR amplification process, nonspecific pairing between the 3\' end of the primer and the DNA sequence can cause cross-talk in the amplification reaction, leading to the generation of interfering sequences and reduced amplification accuracy. To address this issue, we propose an efficient coding algorithm for PCR amplification information retrieval (ECA-PCRAIR). This algorithm employs variable-length scanning and pruning optimization to construct a codebook that maximizes storage density while satisfying traditional biological constraints. Subsequently, a codeword search tree is constructed based on the primer library to optimize the codebook, and a variable-length interleaver is used for constraint detection and correction, thereby minimizing the likelihood of nonspecific pairing. Experimental results demonstrate that ECA-PCRAIR can reduce the probability of nonspecific pairing between the 3\' end of the primer and the DNA sequence to 2-25%, enhancing the robustness of the DNA sequences. Additionally, ECA-PCRAIR achieves a storage density of 2.14-3.67 bits per nucleotide (bits/nt), significantly improving storage capacity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    今天的数字数据存储系统通常提供先进的数据恢复解决方案,以解决灾难性的数据丢失问题。例如基于软件的磁盘扇区分析或传统硬盘驱动器的物理级数据检索方法。然而,基于DNA的数据存储目前仅依赖于用于将数字数据编码为DNA链的方法的固有纠错特性。不能利用由DNA编码方法添加的冗余校正的任何错误导致永久的数据丢失。为DNA存储系统提供数据恢复,我们提出了一种使用喷泉码自动重建存储在DNA中的损坏或丢失数据的方法。我们的方法利用用喷泉码编码的数据包之间的关系来识别和纠正损坏或丢失的数据。此外,我们介绍了三种文件类型的特定文件类型和基于内容的数据恢复方法,说明了喷泉编码特定冗余和有关数据的知识的融合如何有效地恢复损坏的DNA存储系统中的信息,无论是在自动和引导手动方式。为了展示我们的方法,我们引入DR4DNA,包含所有方法的软件工具包。我们使用计算机和体外实验评估DR4DNA。
    Today\'s digital data storage systems typically offer advanced data recovery solutions to address the problem of catastrophic data loss, such as software-based disk sector analysis or physical-level data retrieval methods for conventional hard disk drives. However, DNA-based data storage currently relies solely on the inherent error correction properties of the methods used to encode digital data into strands of DNA. Any error that cannot be corrected utilizing the redundancy added by DNA encoding methods results in permanent data loss. To provide data recovery for DNA storage systems, we present a method to automatically reconstruct corrupted or missing data stored in DNA using fountain codes. Our method exploits the relationships between packets encoded with fountain codes to identify and rectify corrupted or lost data. Furthermore, we present file type-specific and content-based data recovery methods for three file types, illustrating how a fusion of fountain encoding-specific redundancy and knowledge about the data can effectively recover information in a corrupted DNA storage system, both in an automatic and in a guided manual manner. To demonstrate our approach, we introduce DR4DNA, a software toolkit that contains all methods presented. We evaluate DR4DNA using both in-silico and in-vitro experiments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    DNA,作为生物体的储存介质,可以解决现有电磁存储介质的缺点,例如低信息密度,维护功耗高,和短的存储时间。当前对DNA存储的研究主要集中在设计相应的编码器,以将二进制数据转换为满足生物学约束的DNA基础数据。我们创建了一个新的汉字代码表,可以实现非常高的信息存储密度来存储汉字(与传统的UTF-8编码相比)。为了满足生物限制,我们设计了一种低算法复杂度的DNA移位编码方案,可以编码DNA的任何链甚至具有过长的均聚物。设计的DNA序列将存储在744bp的双链质粒中,确保存储过程中的高可靠性。此外,质粒对环境干扰的抵抗力,确保信息长期稳定储存。此外,它可以以较低的成本复制。
    DNA, as the storage medium in organisms, can address the shortcomings of existing electromagnetic storage media, such as low information density, high maintenance power consumption, and short storage time. Current research on DNA storage mainly focuses on designing corresponding encoders to convert binary data into DNA base data that meets biological constraints. We have created a new Chinese character code table that enables exceptionally high information storage density for storing Chinese characters (compared to traditional UTF-8 encoding). To meet biological constraints, we have devised a DNA shift coding scheme with low algorithmic complexity, which can encode any strand of DNA even has excessively long homopolymer. The designed DNA sequence will be stored in a double-stranded plasmid of 744bp, ensuring high reliability during storage. Additionally, the plasmid\'s resistance to environmental interference ensuring long-term stable information storage. Moreover, it can be replicated at a lower cost.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    DNA是高密度的,长期稳定,和可扩展的存储介质,可以满足数据指数增长对存储介质的需求。现有的DNA存储编码方案趋向于实现高密度存储,但没有充分考虑DNA序列的局部和全局稳定性以及存储信息的读写精度。为了解决这些问题,本文提出了一种基于图的DeBruijn修剪旋转图(DBTRG)编码方案。通过将所提出的动态二进制序列与原始二进制序列进行异或,k-mers可以分为DeBruijn修剪图,存储的信息可以根据重叠关系进行压缩。仿真实验结果表明,DBTRG保证了基平衡和多样性,减少了不期望的图案的可能性,提高了DNA存储和数据恢复的稳定性。此外,实现了在存储510KB图像时保持1.92的编码率,并引入了用于DNA存储编码方法的新颖方法和概念。
    DNA is a high-density, long-term stable, and scalable storage medium that can meet the increased demands on storage media resulting from the exponential growth of data. The existing DNA storage encoding schemes tend to achieve high-density storage but do not fully consider the local and global stability of DNA sequences and the read and write accuracy of the stored information. To address these problems, this article presents a graph-based De Bruijn Trim Rotation Graph (DBTRG) encoding scheme. Through XOR between the proposed dynamic binary sequence and the original binary sequence, k-mers can be divided into the De Bruijn Trim graph, and the stored information can be compressed according to the overlapping relationship. The simulated experimental results show that DBTRG ensures base balance and diversity, reduces the likelihood of undesired motifs, and improves the stability of DNA storage and data recovery. Furthermore, the maintenance of an encoding rate of 1.92 while storing 510 KB images and the introduction of novel approaches and concepts for DNA storage encoding methods are achieved.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    DNA是数字数据的极其密集的存储介质。然而,对存储信息的计算既昂贵又缓慢,需要几轮测序,在模拟计算中,和DNA合成。使用DNA杂交或酶促反应访问和修改数据的先前工作具有有限的计算能力。受“DNA链置换”计算能力的启发,“我们通过使用链置换反应的分子计算来增加DNA存储,从而以并行方式在算法上修改数据。我们展示了二进制计数和图灵通用元胞自动机规则110的程序,后者是,原则上,能够实现任何计算机算法。信息存储在DNA的缺口中,和二级序列级编码允许基于高通量测序的读出。我们对4位数据寄存器进行了多轮计算,以及数据的随机访问(选择性访问和擦除)。我们证明了具有244种不同链交换(顺序和并行)的大链置换级联可以使用来自M13噬菌体的天然存在的DNA序列,而无需严格的序列设计。具有提高计算规模和降低成本的潜力。我们的工作融合了DNA存储和DNA计算,为DNA中保存的数字信息的并行操作奠定了完全分子算法的基础。
    DNA is an incredibly dense storage medium for digital data. However, computing on the stored information is expensive and slow, requiring rounds of sequencing, in silico computation, and DNA synthesis. Prior work on accessing and modifying data using DNA hybridization or enzymatic reactions had limited computation capabilities. Inspired by the computational power of \"DNA strand displacement,\" we augment DNA storage with \"in-memory\" molecular computation using strand displacement reactions to algorithmically modify data in a parallel manner. We show programs for binary counting and Turing universal cellular automaton Rule 110, the latter of which is, in principle, capable of implementing any computer algorithm. Information is stored in the nicks of DNA, and a secondary sequence-level encoding allows high-throughput sequencing-based readout. We conducted multiple rounds of computation on 4-bit data registers, as well as random access of data (selective access and erasure). We demonstrate that large strand displacement cascades with 244 distinct strand exchanges (sequential and in parallel) can use naturally occurring DNA sequence from M13 bacteriophage without stringent sequence design, which has the potential to improve the scale of computation and decrease cost. Our work merges DNA storage and DNA computing, setting the foundation of entirely molecular algorithms for parallel manipulation of digital information preserved in DNA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着社会进程的信息化,相关数据量大大增加,传统的存储介质无法满足当前对数据存储的要求。由于其具有高存储容量和持久性的优点,脱氧核糖核酸(DNA)被认为是解决数据存储问题的最具前景的存储介质。合成是DNA储存的重要过程,低质量的DNA编码会增加测序过程中的错误,这可能会影响存储效率。为了减少DNA序列在储存过程中稳定性差引起的错误,本文提出了一种利用双匹配和错误配对约束来提高DNA编码集质量的方法。首先,定义了双匹配和错误配对约束,以解决解决方案中具有自互补反应的序列问题,这些序列在3'端容易发生错配。此外,在算术优化算法中引入了两种策略,包括基本函数的随机扰动和双重自适应加权策略。提出了一种改进的算术优化算法(IAOA)来构造DNA编码集。IAOA在13个基准函数上的实验结果表明,其探索和开发能力比现有算法有了显着提高。此外,IAOA在传统和新的约束下用于DNA编码设计。测试DNA编码组以估计它们关于发夹数量和解链温度的质量。与现有算法相比,本研究中构建的DNA存储编码集在下边界处提高了77.7%。存储集中的DNA序列显示解链温度方差减少9.7-84.1%,发夹结构比降低2.1-80%。结果表明,与传统约束相比,在提出的两种约束下,DNA编码集的稳定性得到了提高。
    With the informationization of social processes, the amount of related data has greatly increased, making traditional storage media unable to meet the current requirements for data storage. Due to its advantages of a high storage capacity and persistence, deoxyribonucleic acid (DNA) has been considered the most prospective storage media to solve the data storage problem. Synthesis is an important process for DNA storage, and low-quality DNA coding can increase errors during sequencing, which can affect the storage efficiency. To reduce errors caused by the poor stability of DNA sequences during storage, this paper proposes a method that uses the double-matching and error-pairing constraints to improve the quality of the DNA coding set. First, the double-matching and error-pairing constraints are defined to solve problems of sequences with self-complementary reactions in the solution that are prone to mismatch at the 3\' end. In addition, two strategies are introduced in the arithmetic optimization algorithm, including a random perturbation of the elementary function and a double adaptive weighting strategy. An improved arithmetic optimization algorithm (IAOA) is proposed to construct DNA coding sets. The experimental results of the IAOA on 13 benchmark functions show a significant improvement in its exploration and development capabilities over the existing algorithms. Moreover, the IAOA is used in the DNA encoding design under both traditional and new constraints. The DNA coding sets are tested to estimate their quality regarding the number of hairpins and melting temperature. The DNA storage coding sets constructed in this study are improved by 77.7% at the lower boundary compared to existing algorithms. The DNA sequences in the storage sets show a reduction of 9.7-84.1% in the melting temperature variance, and the hairpin structure ratio is reduced by 2.1-80%. The results indicate that the stability of the DNA coding sets is improved under the two proposed constraints compared to traditional constraints.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    简介:合成技术的快速发展推动了DNA作为大规模数据存储的潜在介质。同时,如何在DNA存储系统中实现数据安全仍然是一个尚未解决的问题。方法:在本文中,提出了一种基于调制存储架构的图像加密方法。关键思想是利用不可预测的调制信号来加密极易出错的DNA存储通道中的图像。结果与讨论:数值结果表明,我们的图像加密方法是可行和有效的,具有出色的安全性,可以抵抗各种攻击(统计,微分,噪音,和数据丢失)。与其他方法如DNA分子的杂交反应相比,该方法对于大规模应用更可靠和可行。
    Introduction: Rapid development in synthetic technologies has boosted DNA as a potential medium for large-scale data storage. Meanwhile, how to implement data security in the DNA storage system is still an unsolved problem. Methods: In this article, we propose an image encryption method based on the modulation-based storage architecture. The key idea is to take advantage of the unpredictable modulation signals to encrypt images in highly error-prone DNA storage channels. Results and Discussion: Numerical results have demonstrated that our image encryption method is feasible and effective with excellent security against various attacks (statistical, differential, noise, and data loss). When compared with other methods such as the hybridization reactions of DNA molecules, the proposed method is more reliable and feasible for large-scale applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    合成生物学为生命科学研究(“构建学习”)提供了新的范式,并开启了生物技术的未来旅程(“构建使用”)。这里,我们讨论合成生物学使能技术主流中各种原理和技术的进展,包括基因组的合成和组装,DNA储存,基因编辑,功能蛋白的分子进化和从头设计,细胞和基因电路工程,无细胞合成生物学,人工智能(AI)辅助合成生物学,以及生物材料。我们还介绍了定量合成生物学的概念,这正在指导合成生物学提高准确性和可预测性或真正合理的设计。我们得出的结论是,随着使能技术的迭代发展和核心理论的成熟,合成生物学将建立其学科体系。
    Synthetic biology provides a new paradigm for life science research (\"build to learn\") and opens the future journey of biotechnology (\"build to use\"). Here, we discuss advances of various principles and technologies in the mainstream of the enabling technology of synthetic biology, including synthesis and assembly of a genome, DNA storage, gene editing, molecular evolution and de novo design of function proteins, cell and gene circuit engineering, cell-free synthetic biology, artificial intelligence (AI)-aided synthetic biology, as well as biofoundries. We also introduce the concept of quantitative synthetic biology, which is guiding synthetic biology towards increased accuracy and predictability or the real rational design. We conclude that synthetic biology will establish its disciplinary system with the iterative development of enabling technologies and the maturity of the core theory.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    绿色和可持续材料的快速发展为应用研究领域开辟了新的可能性。这些材料包括纳米纤维素复合材料,可以将许多组件集成到复合材料中,并为智能设备提供良好的底盘。在我们的研究中,我们评估了将纳米纤维素复合材料变成信息存储或处理设备的四种方法:1)纳米纤维素可以是合适的载体材料并保护存储在DNA中的信息。2)核苷酸加工酶(聚合酶和外切核酸酶)可以在它们与光门控结构域融合后通过光控制;核苷酸底物特异性可以通过突变或pH变化(信息的读入和读出)来改变。3)可以实现半导体和电子能力:我们表明,通过碘处理代替包括微结构的硅,纳米纤维素被赋予电子。测量纳米纤维素半导体的性质,并对包括单电子晶体管(SET)及其属性在内的所得电势进行建模。电流也可以通过G-四链体DNA分子由DNA传输;这些以及经典的硅半导体可以容易地整合到纳米纤维素复合材料中。4)阐述智能纳米纤维素芯片器件的小型化和集成化,我们展示了纳米纤维素中的pH敏感染料,纳米孔的创建,和细菌膜上的激酶微图案化以及数字PCR微孔。未来的应用潜力包括纳米3D打印和快速分子处理器(例如,SET)与DNA存储和常规电子设备集成在一起。这也将导致用于信息处理的环保纳米纤维素芯片以及用于生物医学应用和纳米工厂的智能纳米纤维素复合材料。
    The rapid development of green and sustainable materials opens up new possibilities in the field of applied research. Such materials include nanocellulose composites that can integrate many components into composites and provide a good chassis for smart devices. In our study, we evaluate four approaches for turning a nanocellulose composite into an information storage or processing device: 1) nanocellulose can be a suitable carrier material and protect information stored in DNA. 2) Nucleotide-processing enzymes (polymerase and exonuclease) can be controlled by light after fusing them with light-gating domains; nucleotide substrate specificity can be changed by mutation or pH change (read-in and read-out of the information). 3) Semiconductors and electronic capabilities can be achieved: we show that nanocellulose is rendered electronic by iodine treatment replacing silicon including microstructures. Nanocellulose semiconductor properties are measured, and the resulting potential including single-electron transistors (SET) and their properties are modeled. Electric current can also be transported by DNA through G-quadruplex DNA molecules; these as well as classical silicon semiconductors can easily be integrated into the nanocellulose composite. 4) To elaborate upon miniaturization and integration for a smart nanocellulose chip device, we demonstrate pH-sensitive dyes in nanocellulose, nanopore creation, and kinase micropatterning on bacterial membranes as well as digital PCR micro-wells. Future application potential includes nano-3D printing and fast molecular processors (e.g., SETs) integrated with DNA storage and conventional electronics. This would also lead to environment-friendly nanocellulose chips for information processing as well as smart nanocellulose composites for biomedical applications and nano-factories.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号