Motif caller for sequence reconstruction in motif-based DNA storage

Guardado en:
Bibliografiske detaljer
Udgivet i:Scientific Reports (Nature Publisher Group) vol. 15, no. 1 (2025), p. 39236-39248
Hovedforfatter: Agarwal, Parv
Andre forfattere: Pinnamaneni, Nimesh, Heinis, Thomas
Udgivet:
Nature Publishing Group
Fag:
Online adgang:Citation/Abstract
Full Text
Full Text - PDF
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!

MARC

LEADER 00000nab a2200000uu 4500
001 3270646425
003 UK-CbPIL
022 |a 2045-2322 
024 7 |a 10.1038/s41598-025-22798-2  |2 doi 
035 |a 3270646425 
045 2 |b d20250101  |b d20251231 
084 |a 274855  |2 nlm 
100 1 |a Agarwal, Parv  |u Department of Computing, Imperial College London, London, UK (ROR: https://ror.org/041kmwe10) (GRID: grid.7445.2) (ISNI: 0000 0001 2113 8111) 
245 1 |a Motif caller for sequence reconstruction in motif-based DNA storage 
260 |b Nature Publishing Group  |c 2025 
513 |a Journal Article 
520 3 |a DNA data storage is rapidly emerging as a promising solution for long-term data archiving, largely due to its exceptional durability. However, the synthesis of DNA strands remains a significant bottleneck in terms of cost and speed. To address this, new methods have been developed that encode information by concatenating long data-carrying DNA sequences from pre-synthesized DNA subsequences – known as motifs – from a library. Reading back data from DNA storage relies on basecalling–the process of translating raw nanopore sequencing signals into DNA base sequences using machine learning models. These sequences are then decoded back into binary data. However, current basecalling approaches are not optimized for decoding motif-carrying DNA: they first predict individual bases from the raw signal and only afterward attempt to identify higher-level motifs. This two-step, motif-agnostic process is both imprecise and inefficient. In this paper we introduce Motif Caller, machine learning model designed to directly detect entire motifs from raw nanopore signals, bypassing the need for intermediate basecalling. By targeting motifs directly, Motif Caller leverages richer signal features associated with each motif, resulting in significantly improved accuracy. This direct approach also enhances the efficiency of data retrieval in motif-based DNA storage systems. 
653 |a Machine learning 
653 |a DNA sequencing 
653 |a DNA biosynthesis 
653 |a Information storage 
653 |a Costs 
653 |a Data storage 
653 |a Deoxyribonucleic acid--DNA 
653 |a Information processing 
653 |a Nucleotide sequence 
653 |a Learning algorithms 
653 |a Economic 
700 1 |a Pinnamaneni, Nimesh  |u Helixworks Technologies, Cork, Ireland 
700 1 |a Heinis, Thomas  |u Department of Computing, Imperial College London, London, UK (ROR: https://ror.org/041kmwe10) (GRID: grid.7445.2) (ISNI: 0000 0001 2113 8111) 
773 0 |t Scientific Reports (Nature Publisher Group)  |g vol. 15, no. 1 (2025), p. 39236-39248 
786 0 |d ProQuest  |t Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3270646425/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3270646425/fulltext/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3270646425/fulltextPDF/embedded/H09TXR3UUZB2ISDL?source=fedsrch