Motif caller for sequence reconstruction in motif-based DNA storage

Guardado en:
書目詳細資料
發表在:Scientific Reports (Nature Publisher Group) vol. 15, no. 1 (2025), p. 39236-39248
主要作者: Agarwal, Parv
其他作者: Pinnamaneni, Nimesh, Heinis, Thomas
出版:
Nature Publishing Group
主題:
在線閱讀:Citation/Abstract
Full Text
Full Text - PDF
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
Resumen:DNA data storage is rapidly emerging as a promising solution for long-term data archiving, largely due to its exceptional durability. However, the synthesis of DNA strands remains a significant bottleneck in terms of cost and speed. To address this, new methods have been developed that encode information by concatenating long data-carrying DNA sequences from pre-synthesized DNA subsequences – known as motifs – from a library. Reading back data from DNA storage relies on basecalling–the process of translating raw nanopore sequencing signals into DNA base sequences using machine learning models. These sequences are then decoded back into binary data. However, current basecalling approaches are not optimized for decoding motif-carrying DNA: they first predict individual bases from the raw signal and only afterward attempt to identify higher-level motifs. This two-step, motif-agnostic process is both imprecise and inefficient. In this paper we introduce Motif Caller, machine learning model designed to directly detect entire motifs from raw nanopore signals, bypassing the need for intermediate basecalling. By targeting motifs directly, Motif Caller leverages richer signal features associated with each motif, resulting in significantly improved accuracy. This direct approach also enhances the efficiency of data retrieval in motif-based DNA storage systems.
ISSN:2045-2322
DOI:10.1038/s41598-025-22798-2
Fuente:Science Database