关键词: DNA DNA storage Data forensics Data recovery Fountain codes Reconstruction

来  源:   DOI:10.1016/j.csbj.2024.04.048   PDF(Pubmed)

Abstract:
Today\'s digital data storage systems typically offer advanced data recovery solutions to address the problem of catastrophic data loss, such as software-based disk sector analysis or physical-level data retrieval methods for conventional hard disk drives. However, DNA-based data storage currently relies solely on the inherent error correction properties of the methods used to encode digital data into strands of DNA. Any error that cannot be corrected utilizing the redundancy added by DNA encoding methods results in permanent data loss. To provide data recovery for DNA storage systems, we present a method to automatically reconstruct corrupted or missing data stored in DNA using fountain codes. Our method exploits the relationships between packets encoded with fountain codes to identify and rectify corrupted or lost data. Furthermore, we present file type-specific and content-based data recovery methods for three file types, illustrating how a fusion of fountain encoding-specific redundancy and knowledge about the data can effectively recover information in a corrupted DNA storage system, both in an automatic and in a guided manual manner. To demonstrate our approach, we introduce DR4DNA, a software toolkit that contains all methods presented. We evaluate DR4DNA using both in-silico and in-vitro experiments.
摘要:
今天的数字数据存储系统通常提供先进的数据恢复解决方案,以解决灾难性的数据丢失问题。例如基于软件的磁盘扇区分析或传统硬盘驱动器的物理级数据检索方法。然而,基于DNA的数据存储目前仅依赖于用于将数字数据编码为DNA链的方法的固有纠错特性。不能利用由DNA编码方法添加的冗余校正的任何错误导致永久的数据丢失。为DNA存储系统提供数据恢复,我们提出了一种使用喷泉码自动重建存储在DNA中的损坏或丢失数据的方法。我们的方法利用用喷泉码编码的数据包之间的关系来识别和纠正损坏或丢失的数据。此外,我们介绍了三种文件类型的特定文件类型和基于内容的数据恢复方法,说明了喷泉编码特定冗余和有关数据的知识的融合如何有效地恢复损坏的DNA存储系统中的信息,无论是在自动和引导手动方式。为了展示我们的方法,我们引入DR4DNA,包含所有方法的软件工具包。我们使用计算机和体外实验评估DR4DNA。
公众号