DNA storage

  • 文章类型: Journal Article
    合成和测序技术的最新进展使脱氧核糖核酸(DNA)成为下一代数字存储的有希望的替代品。随着它接近实际应用,确保DNA存储信息的安全性已成为关键问题。可否认的加密允许解密来自同一密文的不同信息,确保当用户被迫泄露真实信息时,可以提供“似是而非”的虚假信息。在本文中,我们提出了一种唯一利用DNA噪声通道的可否认加密方法。具体来说,真消息和假消息由两个类似的调制载波加密,并且随后被固有错误混淆。实验结果表明,我们的方法不仅可以在虚假信息中隐藏真实信息,而且还允许强制对手和合法接收者准确地解密预期信息。进一步的安全性分析验证了我们的方法对各种典型攻击的抵抗力。与传统的基于复杂生物操作的DNA密码方法相比,我们的方法提供了优越的实用性和可靠性,将其定位为未来大规模DNA存储应用中数据加密的理想解决方案。
    Recent advancements in synthesis and sequencing techniques have made deoxyribonucleic acid (DNA) a promising alternative for next-generation digital storage. As it approaches practical application, ensuring the security of DNA-stored information has become a critical problem. Deniable encryption allows the decryption of different information from the same ciphertext, ensuring that the \"plausible\" fake information can be provided when users are coerced to reveal the real information. In this paper, we propose a deniable encryption method that uniquely leverages DNA noise channels. Specifically, true and fake messages are encrypted by two similar modulation carriers and subsequently obfuscated by inherent errors. Experiment results demonstrate that our method not only can conceal true information among fake ones indistinguishably, but also allow both the coercive adversary and the legitimate receiver to decrypt the intended information accurately. Further security analysis validates the resistance of our method against various typical attacks. Compared with conventional DNA cryptography methods based on complex biological operations, our method offers superior practicality and reliability, positioning it as an ideal solution for data encryption in future large-scale DNA storage applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着数字化转型和新技术的普遍应用,随着海量信息的高密度加载需求,数据存储面临着新的挑战。作为回应,DNA存储技术已经成为一个有前途的研究方向。高效可靠的数据检索对于DNA存储至关重要,随机接入技术的发展对其实用性和可靠性起着关键作用。然而,对于现有的DNA存储工作来说,实现快速准确的随机存取功能已被证明是困难的,这限制了其在工业中的实际应用。在这次审查中,我们总结了DNA存储技术的最新进展,这些技术可以实现随机存取功能,以及需要克服的挑战和当前的解决方案。这篇综述旨在帮助DNA存储领域的研究人员更好地理解随机访问步骤的重要性及其对DNA存储整体发展的影响。此外,讨论了DNA存储随机访问技术面临的挑战和未来的研究趋势,目的是为在大规模数据条件下实现DNA存储的随机访问提供坚实的基础。
    With digital transformation and the general application of new technologies, data storage is facing new challenges with the demand for high-density loading of massive information. In response, DNA storage technology has emerged as a promising research direction. Efficient and reliable data retrieval is critical for DNA storage, and the development of random access technology plays a key role in its practicality and reliability. However, achieving fast and accurate random access functions has proven difficult for existing DNA storage efforts, which limits its practical applications in industry. In this review, we summarize the recent advances in DNA storage technology that enable random access functionality, as well as the challenges that need to be overcome and the current solutions. This review aims to help researchers in the field of DNA storage better understand the importance of the random access step and its impact on the overall development of DNA storage. Furthermore, the remaining challenges and future research trends in random access technology of DNA storage are discussed, with the goal of providing a solid foundation for achieving random access in DNA storage under large-scale data conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着数字数据的指数级增长,迫切需要创新的存储介质和技术。DNA分子,由于其稳定性,存储容量,和密度,为信息存储提供了一个有前途的解决方案。然而,DNA存储也面临许多挑战,如复杂的生化约束和编码效率。本文介绍了资源管理器,一种基于DeBruijn图的高效DNA编码算法,利用其表征局部序列的能力。Explorer可以在各种生化约束下进行编码,如均聚物,GC含量,和不想要的图案。本文还介绍了Codeformer,一种基于变压器结构的快速解码算法,进一步提高解码效率。数值实验表明,与其他高级算法相比,Explorer不仅在各种生化约束下实现了稳定的编码和解码,而且还将编码效率和比特率提高了10%。此外,编解码器证明了有效解码大量DNA序列的能力。在不同的参数设置下,它的解码效率比传统算法高出两倍多。当编码器与Reed-Solomon码结合使用时,它的解码精度超过99%,使其成为高速解码应用的良好选择。预计这些进步将有助于基于DNA的数据存储系统的开发以及对DNA作为新型信息存储介质的更广泛的探索。
    With the exponential growth of digital data, there is a pressing need for innovative storage media and techniques. DNA molecules, due to their stability, storage capacity, and density, offer a promising solution for information storage. However, DNA storage also faces numerous challenges, such as complex biochemical constraints and encoding efficiency. This paper presents Explorer, a high-efficiency DNA coding algorithm based on the De Bruijn graph, which leverages its capability to characterize local sequences. Explorer enables coding under various biochemical constraints, such as homopolymers, GC content, and undesired motifs. This paper also introduces Codeformer, a fast decoding algorithm based on the transformer architecture, to further enhance decoding efficiency. Numerical experiments indicate that, compared with other advanced algorithms, Explorer not only achieves stable encoding and decoding under various biochemical constraints but also increases the encoding efficiency and bit rate by ¿10%. Additionally, Codeformer demonstrates the ability to efficiently decode large quantities of DNA sequences. Under different parameter settings, its decoding efficiency exceeds that of traditional algorithms by more than two-fold. When Codeformer is combined with Reed-Solomon code, its decoding accuracy exceeds 99%, making it a good choice for high-speed decoding applications. These advancements are expected to contribute to the development of DNA-based data storage systems and the broader exploration of DNA as a novel information storage medium.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    聚合酶链反应(PCR)扩增广泛用于从DNA存储中检索信息。在PCR扩增过程中,引物的3'末端和DNA序列之间的非特异性配对可以在扩增反应中引起串扰,导致干扰序列的产生和降低的扩增精度。为了解决这个问题,提出了一种高效的PCR扩增信息检索编码算法(ECA-PCRAIR)。该算法采用可变长度扫描和修剪优化来构造码本,该码本在满足传统生物学约束的同时最大化存储密度。随后,基于引物库构建码字搜索树以优化码本,可变长度交织器用于约束检测和校正,从而最大限度地减少非特异性配对的可能性。实验结果表明,ECA-PCRAIR可以将引物3'末端与DNA序列之间的非特异性配对概率降低到2-25%,增强DNA序列的鲁棒性。此外,ECA-PCRAIR的存储密度为每个核苷酸2.14-3.67位(位/nt),显著提高存储容量。
    Polymerase Chain Reaction (PCR) amplification is widely used for retrieving information from DNA storage. During the PCR amplification process, nonspecific pairing between the 3\' end of the primer and the DNA sequence can cause cross-talk in the amplification reaction, leading to the generation of interfering sequences and reduced amplification accuracy. To address this issue, we propose an efficient coding algorithm for PCR amplification information retrieval (ECA-PCRAIR). This algorithm employs variable-length scanning and pruning optimization to construct a codebook that maximizes storage density while satisfying traditional biological constraints. Subsequently, a codeword search tree is constructed based on the primer library to optimize the codebook, and a variable-length interleaver is used for constraint detection and correction, thereby minimizing the likelihood of nonspecific pairing. Experimental results demonstrate that ECA-PCRAIR can reduce the probability of nonspecific pairing between the 3\' end of the primer and the DNA sequence to 2-25%, enhancing the robustness of the DNA sequences. Additionally, ECA-PCRAIR achieves a storage density of 2.14-3.67 bits per nucleotide (bits/nt), significantly improving storage capacity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    数据量呈指数级增长,因此需要采用替代存储解决方案,DNA储存是最有前途的解决方案。然而,与合成和测序相关的高昂成本阻碍了其发展。预压缩数据被认为是降低存储成本的最有效方法之一。然而,不同的压缩方法对同一文件产生不同的压缩比,用单一方法压缩大量文件可能达不到最大压缩率。本研究提出了一种基于机器学习分类算法的多文件动态压缩方法,该方法为每个文件选择合适的压缩方法,以尽可能最大程度地减少存储到DNA中的数据量。首先,四种不同的压缩方法被应用于收集的文件。随后,选择最佳压缩方法作为标签,以及文件类型和大小用作功能,将其放入七种机器学习分类算法中进行训练。结果表明,在验证集和测试集上,k最近邻算法在大多数时间优于其他机器学习算法。准确率超过85%,波动性较小。此外,根据k-近邻模型可以实现30.85%的压缩率,与传统的单一压缩方法相比,超过4.5%,在0.48亿至30亿美元/TB的范围内节省了大量的DNA存储成本。与传统的压缩方法相比,多文件动态压缩方法在压缩多个文件时表现出更显著的压缩效果。因此,它可以大大降低DNA存储的成本,并促进DNA存储技术的广泛实施。
    The exponential growth in data volume has necessitated the adoption of alternative storage solutions, and DNA storage stands out as the most promising solution. However, the exorbitant costs associated with synthesis and sequencing impeded its development. Pre-compressing the data is recognized as one of the most effective approaches for reducing storage costs. However, different compression methods yield varying compression ratios for the same file, and compressing a large number of files with a single method may not achieve the maximum compression ratio. This study proposes a multi-file dynamic compression method based on machine learning classification algorithms that selects the appropriate compression method for each file to minimize the amount of data stored into DNA as much as possible. Firstly, four different compression methods are applied to the collected files. Subsequently, the optimal compression method is selected as a label, as well as the file type and size are used as features, which are put into seven machine learning classification algorithms for training. The results demonstrate that k-nearest neighbor outperforms other machine learning algorithms on the validation set and test set most of the time, achieving an accuracy rate of over 85% and showing less volatility. Additionally, the compression rate of 30.85% can be achieved according to k-nearest neighbor model, more than 4.5% compared to the traditional single compression method, resulting in significant cost savings for DNA storage in the range of $0.48 to 3 billion/TB. In comparison to the traditional compression method, the multi-file dynamic compression method demonstrates a more significant compression effect when compressing multiple files. Therefore, it can considerably decrease the cost of DNA storage and facilitate the widespread implementation of DNA storage technology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    生物分子的稳健封装和可控释放具有广泛的生物医学应用,从生物传感,将药物输送到信息存储。然而,传统的生物分子封装策略在复杂的操作中具有局限性,光学不稳定性,解封困难。这里,我们报告一个简单的,健壮,基于具有低温相变特性的镓液态金属的无溶剂生物分子封装策略,自我修复,高气密性密封,和固有的抗光学损伤。我们将生物分子与固体镓薄膜夹在中间,然后对薄膜进行低温焊接以直接密封。镓不仅可以保护DNA和酶免受各种物理和化学损害,而且还可以通过施加振动以破坏液体镓来按需释放生物分子。我们证明了在加速老化测试后,可以恢复DNA编码的图像文件,序列保留率高达99.9%。我们还展示了生物试剂的可控释放在一锅法RPA-CRISPR/Cas12a反应中的实际应用,用于SARS-COV-2筛选,检测限在40分钟内达到10个拷贝。这项工作可以通过将低熔点金属用于生物技术来促进坚固且刺激响应的生物分子胶囊的开发。
    Robust encapsulation and controllable release of biomolecules have wide biomedical applications ranging from biosensing, drug delivery to information storage. However, conventional biomolecule encapsulation strategies have limitations in complicated operations, optical instability, and difficulty in decapsulation. Here, we report a simple, robust, and solvent-free biomolecule encapsulation strategy based on gallium liquid metal featuring low-temperature phase transition, self-healing, high hermetic sealing, and intrinsic resistance to optical damage. We sandwiched the biomolecules with the solid gallium films followed by low-temperature welding of the films for direct sealing. The gallium can not only protect DNA and enzymes from various physical and chemical damages but also allow the on-demand release of biomolecules by applying vibration to break the liquid gallium. We demonstrated that a DNA-coded image file can be recovered with up to 99.9% sequence retention after an accelerated aging test. We also showed the practical applications of the controllable release of bioreagents in a one-pot RPA-CRISPR/Cas12a reaction for SARS-COV-2 screening with a low detection limit of 10 copies within 40 min. This work may facilitate the development of robust and stimuli-responsive biomolecule capsules by using low-melting metals for biotechnology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    DNA,作为生物体的储存介质,可以解决现有电磁存储介质的缺点,例如低信息密度,维护功耗高,和短的存储时间。当前对DNA存储的研究主要集中在设计相应的编码器,以将二进制数据转换为满足生物学约束的DNA基础数据。我们创建了一个新的汉字代码表,可以实现非常高的信息存储密度来存储汉字(与传统的UTF-8编码相比)。为了满足生物限制,我们设计了一种低算法复杂度的DNA移位编码方案,可以编码DNA的任何链甚至具有过长的均聚物。设计的DNA序列将存储在744bp的双链质粒中,确保存储过程中的高可靠性。此外,质粒对环境干扰的抵抗力,确保信息长期稳定储存。此外,它可以以较低的成本复制。
    DNA, as the storage medium in organisms, can address the shortcomings of existing electromagnetic storage media, such as low information density, high maintenance power consumption, and short storage time. Current research on DNA storage mainly focuses on designing corresponding encoders to convert binary data into DNA base data that meets biological constraints. We have created a new Chinese character code table that enables exceptionally high information storage density for storing Chinese characters (compared to traditional UTF-8 encoding). To meet biological constraints, we have devised a DNA shift coding scheme with low algorithmic complexity, which can encode any strand of DNA even has excessively long homopolymer. The designed DNA sequence will be stored in a double-stranded plasmid of 744bp, ensuring high reliability during storage. Additionally, the plasmid\'s resistance to environmental interference ensuring long-term stable information storage. Moreover, it can be replicated at a lower cost.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在单链DNA/RNA中,二级结构非常常见,尤其是在长序列中。已经认识到,DNA序列中的高度二级结构可能干扰DNA存储中信息的正确写入和读取。然而,很少研究如何规避其副作用。
    方法:由于DNA序列的二级结构程度与复杂折叠过程中释放的自由能的大小密切相关,我们首先基于随机产生的DNA序列研究不同编码长度下的自由能分布。然后,我们构建了双向长短期(BiLSTM)-注意力深度学习模型来预测序列的自由能。
    结果:我们的模拟结果表明,特定长度的DNA序列的自由能遵循右偏斜分布,并且平均值随着长度的增加而增加。给定20kcal/mol的容许自由能阈值,我们可以通过选择100nt的可行编码长度,将编码序列中严重二级结构的比例控制在显着水平的1%以内。与传统的深度学习模式相比,该模型在平均相对误差(MRE)和判定系数(R2)上都能取得较好的预测效果。仿真实验中MRE=0.109,R2=0.918。BiLSTM和注意模块的组合可以处理长期依赖性并捕获碱基配对的特征。Further,该预测具有线性时间复杂度,适合在未来大规模应用中检测具有严重二级结构的序列。最后,可以在真实数据集上筛选出94个预测自由能中的70个。它表明,所提出的模型可以筛选出一些高度可疑的序列,这些序列容易产生更多的错误和低测序拷贝。
    BACKGROUND: In single-stranded DNAs/RNAs, secondary structures are very common especially in long sequences. It has been recognized that the high degree of secondary structures in DNA sequences could interfere with the correct writing and reading of information in DNA storage. However, how to circumvent its side-effect is seldom studied.
    METHODS: As the degree of secondary structures of DNA sequences is closely related to the magnitude of the free energy released in the complicated folding process, we first investigate the free-energy distribution at different encoding lengths based on randomly generated DNA sequences. Then, we construct a bidirectional long short-term (BiLSTM)-attention deep learning model to predict the free energy of sequences.
    RESULTS: Our simulation results indicate that the free energy of DNA sequences at a specific length follows a right skewed distribution and the mean increases as the length increases. Given a tolerable free energy threshold of 20 kcal/mol, we could control the ratio of serious secondary structures in the encoding sequences to within 1% of the significant level through selecting a feasible encoding length of 100 nt. Compared with traditional deep learning models, the proposed model could achieve a better prediction performance both in the mean relative error (MRE) and the coefficient of determination (R2). It achieved MRE = 0.109 and R2 = 0.918 respectively in the simulation experiment. The combination of the BiLSTM and attention module can handle the long-term dependencies and capture the feature of base pairing. Further, the prediction has a linear time complexity which is suitable for detecting sequences with severe secondary structures in future large-scale applications. Finally, 70 of 94 predicted free energy can be screened out on a real dataset. It demonstrates that the proposed model could screen out some highly suspicious sequences which are prone to produce more errors and low sequencing copies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    DNA是高密度的,长期稳定,和可扩展的存储介质,可以满足数据指数增长对存储介质的需求。现有的DNA存储编码方案趋向于实现高密度存储,但没有充分考虑DNA序列的局部和全局稳定性以及存储信息的读写精度。为了解决这些问题,本文提出了一种基于图的DeBruijn修剪旋转图(DBTRG)编码方案。通过将所提出的动态二进制序列与原始二进制序列进行异或,k-mers可以分为DeBruijn修剪图,存储的信息可以根据重叠关系进行压缩。仿真实验结果表明,DBTRG保证了基平衡和多样性,减少了不期望的图案的可能性,提高了DNA存储和数据恢复的稳定性。此外,实现了在存储510KB图像时保持1.92的编码率,并引入了用于DNA存储编码方法的新颖方法和概念。
    DNA is a high-density, long-term stable, and scalable storage medium that can meet the increased demands on storage media resulting from the exponential growth of data. The existing DNA storage encoding schemes tend to achieve high-density storage but do not fully consider the local and global stability of DNA sequences and the read and write accuracy of the stored information. To address these problems, this article presents a graph-based De Bruijn Trim Rotation Graph (DBTRG) encoding scheme. Through XOR between the proposed dynamic binary sequence and the original binary sequence, k-mers can be divided into the De Bruijn Trim graph, and the stored information can be compressed according to the overlapping relationship. The simulated experimental results show that DBTRG ensures base balance and diversity, reduces the likelihood of undesired motifs, and improves the stability of DNA storage and data recovery. Furthermore, the maintenance of an encoding rate of 1.92 while storing 510 KB images and the introduction of novel approaches and concepts for DNA storage encoding methods are achieved.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    由于DNA的密度,它是一种有吸引力的长期数据存储媒介,易于复制,可持续性和长寿。最近的进展集中在新的编码算法的发展,自动化,和测序技术。尽管在这些分领域取得了进展,DNA存储部署中最具挑战性的障碍仍然是保存的可靠性和读取的可重复性。在这里,我们报道了磁珠球形核酸(MB-SNA)复合微结构的构建及其作为可靠DNA保存和重复读取的经济有效平台的用途。MB-SNA具有二氧化硅@γ-Fe2O3@二氧化硅微珠的内核和双链DNA(dsDNA)的球形外壳,密度高达34pmol/cm2。对于MB-SNA,每条dsDNA链存储了一段数据,dsDNA的高密度包装实现了高容量存储。MB-SNA在可靠的保存方面优于游离DNA。通过加速老化试验,MB-SNA的数据在-18°C和50%相对湿度下保存23万年后被证明是可读的。此外,MB-SNA通过容易的PCR-磁性分离促进重复读取。经过10个PCR循环,MB-SNA的dsDNA保留率高达93%,测序的准确率超过98%。此外,MB-SNA使具有成本效益的DNA存储变得可行。通过连续稀释,MB-SNA实现准确读数的物理极限被探测为低至两个微结构。
    DNA is an attractive medium for long-term data storage because of its density, ease of copying, sustainability, and longevity. Recent advances have focused on the development of new encoding algorithms, automation, and sequencing technologies. Despite progress in these subareas, the most challenging hurdle in the deployment of DNA storage remains the reliability of preservation and the repeatability of reading. Herein, we report the construction of a magnetic bead spherical nucleic acid (MB-SNA) composite microstructure and its use as a cost-effective platform for reliable DNA preservation and repeated reading. MB-SNA has an inner core of silica@γ-Fe2O3@silica microbeads and an outer spherical shell of double-stranded DNA (dsDNA) with a density as high as 34 pmol/cm2. For MB-SNA, each strand of dsDNA stored a piece of data, and the high-density packing of dsDNA achieved high-capacity storage. MB-SNA was advantageous in terms of reliable preservation over free DNA. By accelerated aging tests, the data of MB-SNA is demonstrated to be readable after 0.23 million years of preservation at -18 °C and 50% relative humidity. Moreover, MB-SNA facilitated repeated reading by facile PCR-magnetic separation. After 10 cycles of PCR access, the retention rate of dsDNA for MB-SNA is demonstrated to be as high as 93%, and the accuracy of sequencing is more than 98%. In addition, MB-SNA makes cost-effective DNA storage feasible. By serial dilution, the physical limit for MB-SNA to achieve accurate reading is probed to be as low as two microstructures.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号