DNA storage

  • 文章类型: Journal Article
    合成和测序技术的最新进展使脱氧核糖核酸(DNA)成为下一代数字存储的有希望的替代品。随着它接近实际应用,确保DNA存储信息的安全性已成为关键问题。可否认的加密允许解密来自同一密文的不同信息,确保当用户被迫泄露真实信息时,可以提供“似是而非”的虚假信息。在本文中,我们提出了一种唯一利用DNA噪声通道的可否认加密方法。具体来说,真消息和假消息由两个类似的调制载波加密,并且随后被固有错误混淆。实验结果表明,我们的方法不仅可以在虚假信息中隐藏真实信息,而且还允许强制对手和合法接收者准确地解密预期信息。进一步的安全性分析验证了我们的方法对各种典型攻击的抵抗力。与传统的基于复杂生物操作的DNA密码方法相比,我们的方法提供了优越的实用性和可靠性,将其定位为未来大规模DNA存储应用中数据加密的理想解决方案。
    Recent advancements in synthesis and sequencing techniques have made deoxyribonucleic acid (DNA) a promising alternative for next-generation digital storage. As it approaches practical application, ensuring the security of DNA-stored information has become a critical problem. Deniable encryption allows the decryption of different information from the same ciphertext, ensuring that the \"plausible\" fake information can be provided when users are coerced to reveal the real information. In this paper, we propose a deniable encryption method that uniquely leverages DNA noise channels. Specifically, true and fake messages are encrypted by two similar modulation carriers and subsequently obfuscated by inherent errors. Experiment results demonstrate that our method not only can conceal true information among fake ones indistinguishably, but also allow both the coercive adversary and the legitimate receiver to decrypt the intended information accurately. Further security analysis validates the resistance of our method against various typical attacks. Compared with conventional DNA cryptography methods based on complex biological operations, our method offers superior practicality and reliability, positioning it as an ideal solution for data encryption in future large-scale DNA storage applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着数字化转型和新技术的普遍应用,随着海量信息的高密度加载需求,数据存储面临着新的挑战。作为回应,DNA存储技术已经成为一个有前途的研究方向。高效可靠的数据检索对于DNA存储至关重要,随机接入技术的发展对其实用性和可靠性起着关键作用。然而,对于现有的DNA存储工作来说,实现快速准确的随机存取功能已被证明是困难的,这限制了其在工业中的实际应用。在这次审查中,我们总结了DNA存储技术的最新进展,这些技术可以实现随机存取功能,以及需要克服的挑战和当前的解决方案。这篇综述旨在帮助DNA存储领域的研究人员更好地理解随机访问步骤的重要性及其对DNA存储整体发展的影响。此外,讨论了DNA存储随机访问技术面临的挑战和未来的研究趋势,目的是为在大规模数据条件下实现DNA存储的随机访问提供坚实的基础。
    With digital transformation and the general application of new technologies, data storage is facing new challenges with the demand for high-density loading of massive information. In response, DNA storage technology has emerged as a promising research direction. Efficient and reliable data retrieval is critical for DNA storage, and the development of random access technology plays a key role in its practicality and reliability. However, achieving fast and accurate random access functions has proven difficult for existing DNA storage efforts, which limits its practical applications in industry. In this review, we summarize the recent advances in DNA storage technology that enable random access functionality, as well as the challenges that need to be overcome and the current solutions. This review aims to help researchers in the field of DNA storage better understand the importance of the random access step and its impact on the overall development of DNA storage. Furthermore, the remaining challenges and future research trends in random access technology of DNA storage are discussed, with the goal of providing a solid foundation for achieving random access in DNA storage under large-scale data conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着数字数据的指数级增长,迫切需要创新的存储介质和技术。DNA分子,由于其稳定性,存储容量,和密度,为信息存储提供了一个有前途的解决方案。然而,DNA存储也面临许多挑战,如复杂的生化约束和编码效率。本文介绍了资源管理器,一种基于DeBruijn图的高效DNA编码算法,利用其表征局部序列的能力。Explorer可以在各种生化约束下进行编码,如均聚物,GC含量,和不想要的图案。本文还介绍了Codeformer,一种基于变压器结构的快速解码算法,进一步提高解码效率。数值实验表明,与其他高级算法相比,Explorer不仅在各种生化约束下实现了稳定的编码和解码,而且还将编码效率和比特率提高了10%。此外,编解码器证明了有效解码大量DNA序列的能力。在不同的参数设置下,它的解码效率比传统算法高出两倍多。当编码器与Reed-Solomon码结合使用时,它的解码精度超过99%,使其成为高速解码应用的良好选择。预计这些进步将有助于基于DNA的数据存储系统的开发以及对DNA作为新型信息存储介质的更广泛的探索。
    With the exponential growth of digital data, there is a pressing need for innovative storage media and techniques. DNA molecules, due to their stability, storage capacity, and density, offer a promising solution for information storage. However, DNA storage also faces numerous challenges, such as complex biochemical constraints and encoding efficiency. This paper presents Explorer, a high-efficiency DNA coding algorithm based on the De Bruijn graph, which leverages its capability to characterize local sequences. Explorer enables coding under various biochemical constraints, such as homopolymers, GC content, and undesired motifs. This paper also introduces Codeformer, a fast decoding algorithm based on the transformer architecture, to further enhance decoding efficiency. Numerical experiments indicate that, compared with other advanced algorithms, Explorer not only achieves stable encoding and decoding under various biochemical constraints but also increases the encoding efficiency and bit rate by ¿10%. Additionally, Codeformer demonstrates the ability to efficiently decode large quantities of DNA sequences. Under different parameter settings, its decoding efficiency exceeds that of traditional algorithms by more than two-fold. When Codeformer is combined with Reed-Solomon code, its decoding accuracy exceeds 99%, making it a good choice for high-speed decoding applications. These advancements are expected to contribute to the development of DNA-based data storage systems and the broader exploration of DNA as a novel information storage medium.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    聚合酶链反应(PCR)扩增广泛用于从DNA存储中检索信息。在PCR扩增过程中,引物的3'末端和DNA序列之间的非特异性配对可以在扩增反应中引起串扰,导致干扰序列的产生和降低的扩增精度。为了解决这个问题,提出了一种高效的PCR扩增信息检索编码算法(ECA-PCRAIR)。该算法采用可变长度扫描和修剪优化来构造码本,该码本在满足传统生物学约束的同时最大化存储密度。随后,基于引物库构建码字搜索树以优化码本,可变长度交织器用于约束检测和校正,从而最大限度地减少非特异性配对的可能性。实验结果表明,ECA-PCRAIR可以将引物3'末端与DNA序列之间的非特异性配对概率降低到2-25%,增强DNA序列的鲁棒性。此外,ECA-PCRAIR的存储密度为每个核苷酸2.14-3.67位(位/nt),显著提高存储容量。
    Polymerase Chain Reaction (PCR) amplification is widely used for retrieving information from DNA storage. During the PCR amplification process, nonspecific pairing between the 3\' end of the primer and the DNA sequence can cause cross-talk in the amplification reaction, leading to the generation of interfering sequences and reduced amplification accuracy. To address this issue, we propose an efficient coding algorithm for PCR amplification information retrieval (ECA-PCRAIR). This algorithm employs variable-length scanning and pruning optimization to construct a codebook that maximizes storage density while satisfying traditional biological constraints. Subsequently, a codeword search tree is constructed based on the primer library to optimize the codebook, and a variable-length interleaver is used for constraint detection and correction, thereby minimizing the likelihood of nonspecific pairing. Experimental results demonstrate that ECA-PCRAIR can reduce the probability of nonspecific pairing between the 3\' end of the primer and the DNA sequence to 2-25%, enhancing the robustness of the DNA sequences. Additionally, ECA-PCRAIR achieves a storage density of 2.14-3.67 bits per nucleotide (bits/nt), significantly improving storage capacity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    数据量呈指数级增长,因此需要采用替代存储解决方案,DNA储存是最有前途的解决方案。然而,与合成和测序相关的高昂成本阻碍了其发展。预压缩数据被认为是降低存储成本的最有效方法之一。然而,不同的压缩方法对同一文件产生不同的压缩比,用单一方法压缩大量文件可能达不到最大压缩率。本研究提出了一种基于机器学习分类算法的多文件动态压缩方法,该方法为每个文件选择合适的压缩方法,以尽可能最大程度地减少存储到DNA中的数据量。首先,四种不同的压缩方法被应用于收集的文件。随后,选择最佳压缩方法作为标签,以及文件类型和大小用作功能,将其放入七种机器学习分类算法中进行训练。结果表明,在验证集和测试集上,k最近邻算法在大多数时间优于其他机器学习算法。准确率超过85%,波动性较小。此外,根据k-近邻模型可以实现30.85%的压缩率,与传统的单一压缩方法相比,超过4.5%,在0.48亿至30亿美元/TB的范围内节省了大量的DNA存储成本。与传统的压缩方法相比,多文件动态压缩方法在压缩多个文件时表现出更显著的压缩效果。因此,它可以大大降低DNA存储的成本,并促进DNA存储技术的广泛实施。
    The exponential growth in data volume has necessitated the adoption of alternative storage solutions, and DNA storage stands out as the most promising solution. However, the exorbitant costs associated with synthesis and sequencing impeded its development. Pre-compressing the data is recognized as one of the most effective approaches for reducing storage costs. However, different compression methods yield varying compression ratios for the same file, and compressing a large number of files with a single method may not achieve the maximum compression ratio. This study proposes a multi-file dynamic compression method based on machine learning classification algorithms that selects the appropriate compression method for each file to minimize the amount of data stored into DNA as much as possible. Firstly, four different compression methods are applied to the collected files. Subsequently, the optimal compression method is selected as a label, as well as the file type and size are used as features, which are put into seven machine learning classification algorithms for training. The results demonstrate that k-nearest neighbor outperforms other machine learning algorithms on the validation set and test set most of the time, achieving an accuracy rate of over 85% and showing less volatility. Additionally, the compression rate of 30.85% can be achieved according to k-nearest neighbor model, more than 4.5% compared to the traditional single compression method, resulting in significant cost savings for DNA storage in the range of $0.48 to 3 billion/TB. In comparison to the traditional compression method, the multi-file dynamic compression method demonstrates a more significant compression effect when compressing multiple files. Therefore, it can considerably decrease the cost of DNA storage and facilitate the widespread implementation of DNA storage technology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    生物分子的稳健封装和可控释放具有广泛的生物医学应用,从生物传感,将药物输送到信息存储。然而,传统的生物分子封装策略在复杂的操作中具有局限性,光学不稳定性,解封困难。这里,我们报告一个简单的,健壮,基于具有低温相变特性的镓液态金属的无溶剂生物分子封装策略,自我修复,高气密性密封,和固有的抗光学损伤。我们将生物分子与固体镓薄膜夹在中间,然后对薄膜进行低温焊接以直接密封。镓不仅可以保护DNA和酶免受各种物理和化学损害,而且还可以通过施加振动以破坏液体镓来按需释放生物分子。我们证明了在加速老化测试后,可以恢复DNA编码的图像文件,序列保留率高达99.9%。我们还展示了生物试剂的可控释放在一锅法RPA-CRISPR/Cas12a反应中的实际应用,用于SARS-COV-2筛选,检测限在40分钟内达到10个拷贝。这项工作可以通过将低熔点金属用于生物技术来促进坚固且刺激响应的生物分子胶囊的开发。
    Robust encapsulation and controllable release of biomolecules have wide biomedical applications ranging from biosensing, drug delivery to information storage. However, conventional biomolecule encapsulation strategies have limitations in complicated operations, optical instability, and difficulty in decapsulation. Here, we report a simple, robust, and solvent-free biomolecule encapsulation strategy based on gallium liquid metal featuring low-temperature phase transition, self-healing, high hermetic sealing, and intrinsic resistance to optical damage. We sandwiched the biomolecules with the solid gallium films followed by low-temperature welding of the films for direct sealing. The gallium can not only protect DNA and enzymes from various physical and chemical damages but also allow the on-demand release of biomolecules by applying vibration to break the liquid gallium. We demonstrated that a DNA-coded image file can be recovered with up to 99.9% sequence retention after an accelerated aging test. We also showed the practical applications of the controllable release of bioreagents in a one-pot RPA-CRISPR/Cas12a reaction for SARS-COV-2 screening with a low detection limit of 10 copies within 40 min. This work may facilitate the development of robust and stimuli-responsive biomolecule capsules by using low-melting metals for biotechnology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    今天的数字数据存储系统通常提供先进的数据恢复解决方案,以解决灾难性的数据丢失问题。例如基于软件的磁盘扇区分析或传统硬盘驱动器的物理级数据检索方法。然而,基于DNA的数据存储目前仅依赖于用于将数字数据编码为DNA链的方法的固有纠错特性。不能利用由DNA编码方法添加的冗余校正的任何错误导致永久的数据丢失。为DNA存储系统提供数据恢复,我们提出了一种使用喷泉码自动重建存储在DNA中的损坏或丢失数据的方法。我们的方法利用用喷泉码编码的数据包之间的关系来识别和纠正损坏或丢失的数据。此外,我们介绍了三种文件类型的特定文件类型和基于内容的数据恢复方法,说明了喷泉编码特定冗余和有关数据的知识的融合如何有效地恢复损坏的DNA存储系统中的信息,无论是在自动和引导手动方式。为了展示我们的方法,我们引入DR4DNA,包含所有方法的软件工具包。我们使用计算机和体外实验评估DR4DNA。
    Today\'s digital data storage systems typically offer advanced data recovery solutions to address the problem of catastrophic data loss, such as software-based disk sector analysis or physical-level data retrieval methods for conventional hard disk drives. However, DNA-based data storage currently relies solely on the inherent error correction properties of the methods used to encode digital data into strands of DNA. Any error that cannot be corrected utilizing the redundancy added by DNA encoding methods results in permanent data loss. To provide data recovery for DNA storage systems, we present a method to automatically reconstruct corrupted or missing data stored in DNA using fountain codes. Our method exploits the relationships between packets encoded with fountain codes to identify and rectify corrupted or lost data. Furthermore, we present file type-specific and content-based data recovery methods for three file types, illustrating how a fusion of fountain encoding-specific redundancy and knowledge about the data can effectively recover information in a corrupted DNA storage system, both in an automatic and in a guided manual manner. To demonstrate our approach, we introduce DR4DNA, a software toolkit that contains all methods presented. We evaluate DR4DNA using both in-silico and in-vitro experiments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    环境DNA(eDNA)工作流程包含许多熟悉的分子实验室技术,但也采用了几种独特的方法。当使用eDNA时,必须通过保存从收集点避免污染,并选择有意义的阴性对照。由于eDNA可以从各种样品和栖息地获得(例如,土壤,水,空气,或组织),协议将根据使用情况而有所不同。样品可能需要额外的步骤来稀释,块,或去除抑制剂或物理分解样品或过滤器。此后,采用标准DNA分离技术(基于试剂盒或苯酚:氯仿:异戊基[PCI])。一旦DNA被提取出来,它通常使用荧光计进行定量。收益率差异很大,但重要的是在扩增感兴趣的基因之前知道。鼓励采样材料和提取的DNA的长期储存,因为它为溢出/污染的样品提供了备份,数据丢失,重新分析,以及使用较新技术的未来研究。在冰箱中储存通常是理想的;然而,一些存储缓冲区(例如,Longmires)要求过滤器或拭子保持在室温下,以防止与缓冲液相关的溶质沉淀。这些eDNA分离的基线方法,验证,和保存在本协议章节中详细介绍。此外,我们概述了一个具有成本效益的,优化了自制提取协议以提取eDNA。
    Environmental DNA (eDNA) workflows contain many familiar molecular-lab techniques, but also employ several unique methodologies. When working with eDNA, it is essential to avoid contamination from the point of collection through preservation and select a meaningful negative control. As eDNA can be obtained from a variety of samples and habitats (e.g., soil, water, air, or tissue), protocols will vary depending on usage. Samples may require additional steps to dilute, block, or remove inhibitors or physically break up samples or filters. Thereafter, standard DNA isolation techniques (kit-based or phenol:chloroform:isoamyl [PCI]) are employed. Once DNA is extracted, it is typically quantified using a fluorometer. Yields vary greatly, but are important to know prior to amplification of the gene(s) of interest. Long-term storage of both the sampled material and the extracted DNA is encouraged, as it provides a backup for spilled/contaminated samples, lost data, reanalysis, and future studies using newer technology. Storage in a freezer is often ideal; however, some storage buffers (e.g., Longmires) require that filters or swabs are kept at room temperature to prevent precipitation of buffer-related solutes. These baseline methods for eDNA isolation, validation, and preservation are detailed in this protocol chapter. In addition, we outline a cost-effective, homebrew extraction protocol optimized to extract eDNA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在没有DNA模板的情况下,预定义序列的长双链DNA分子的从头算产生尤其具有挑战性。DNA合成步骤仍然是许多应用的瓶颈,例如祖先基因功能评估,分析选择性剪接或基于DNA的数据存储。在本报告中,我们提出了一种完全体外的方案,以使用GoldenGate组装在不到3天的时间内从市售的短DNA块开始产生非常长的双链DNA分子。这种创新的应用使我们能够简化生产24kb长的DNA分子的过程,该分子存储了1789年《人权宣言》和《公民权利宣言》的一部分。产生的DNA分子可以容易地克隆到合适的宿主/载体系统中用于扩增和选择。
    In the absence of a DNA template, the ab initio production of long double-stranded DNA molecules of predefined sequences is particularly challenging. The DNA synthesis step remains a bottleneck for many applications such as functional assessment of ancestral genes, analysis of alternative splicing or DNA-based data storage. In this report we propose a fully in vitro protocol to generate very long double-stranded DNA molecules starting from commercially available short DNA blocks in less than 3 days using Golden Gate assembly. This innovative application allowed us to streamline the process to produce a 24 kb-long DNA molecule storing part of the Declaration of the Rights of Man and of the Citizen of 1789 . The DNA molecule produced can be readily cloned into a suitable host/vector system for amplification and selection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由于其高信息密度,DNA作为数据存储系统非常有吸引力。然而,一个主要障碍是使用下一代测序检索DNA数据的高成本和长周转时间.在这里,描述了使用微流体超大规模集成(mVLSI)平台来执行存储在DNA中的数据的高度并行和快速读出。此外,证明了编码在DNA中的多态数据可以通过片上熔解曲线分析来解密,从而进一步增加可以分析的数据内容。mVLSI网络体系结构与精细特异性DNA识别的配对产生了用于快速DNA数据读取的可扩展平台。
    Due to its high information density, DNA is very attractive as a data storage system. However, a major obstacle is the high cost and long turnaround time for retrieving DNA data with next-generation sequencing. Herein, the use of a microfluidic very large-scale integration (mVLSI) platform is described to perform highly parallel and rapid readout of data stored in DNA. Additionally, it is demonstrated that multi-state data encoded in DNA can be deciphered with on-chip melt-curve analysis, thereby further increasing the data content that can be analyzed. The pairing of mVLSI network architecture with exquisitely specific DNA recognition gives rise to a scalable platform for rapid DNA data reading.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号