Motif caller for sequence reconstruction in motif-based DNA storage
Guardado en:
| Udgivet i: | Scientific Reports (Nature Publisher Group) vol. 15, no. 1 (2025), p. 39236-39248 |
|---|---|
| Hovedforfatter: | |
| Andre forfattere: | , |
| Udgivet: |
Nature Publishing Group
|
| Fag: | |
| Online adgang: | Citation/Abstract Full Text Full Text - PDF |
| Tags: |
Ingen Tags, Vær først til at tagge denne postø!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3270646425 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2045-2322 | ||
| 024 | 7 | |a 10.1038/s41598-025-22798-2 |2 doi | |
| 035 | |a 3270646425 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 274855 |2 nlm | ||
| 100 | 1 | |a Agarwal, Parv |u Department of Computing, Imperial College London, London, UK (ROR: https://ror.org/041kmwe10) (GRID: grid.7445.2) (ISNI: 0000 0001 2113 8111) | |
| 245 | 1 | |a Motif caller for sequence reconstruction in motif-based DNA storage | |
| 260 | |b Nature Publishing Group |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a DNA data storage is rapidly emerging as a promising solution for long-term data archiving, largely due to its exceptional durability. However, the synthesis of DNA strands remains a significant bottleneck in terms of cost and speed. To address this, new methods have been developed that encode information by concatenating long data-carrying DNA sequences from pre-synthesized DNA subsequences – known as motifs – from a library. Reading back data from DNA storage relies on basecalling–the process of translating raw nanopore sequencing signals into DNA base sequences using machine learning models. These sequences are then decoded back into binary data. However, current basecalling approaches are not optimized for decoding motif-carrying DNA: they first predict individual bases from the raw signal and only afterward attempt to identify higher-level motifs. This two-step, motif-agnostic process is both imprecise and inefficient. In this paper we introduce Motif Caller, machine learning model designed to directly detect entire motifs from raw nanopore signals, bypassing the need for intermediate basecalling. By targeting motifs directly, Motif Caller leverages richer signal features associated with each motif, resulting in significantly improved accuracy. This direct approach also enhances the efficiency of data retrieval in motif-based DNA storage systems. | |
| 653 | |a Machine learning | ||
| 653 | |a DNA sequencing | ||
| 653 | |a DNA biosynthesis | ||
| 653 | |a Information storage | ||
| 653 | |a Costs | ||
| 653 | |a Data storage | ||
| 653 | |a Deoxyribonucleic acid--DNA | ||
| 653 | |a Information processing | ||
| 653 | |a Nucleotide sequence | ||
| 653 | |a Learning algorithms | ||
| 653 | |a Economic | ||
| 700 | 1 | |a Pinnamaneni, Nimesh |u Helixworks Technologies, Cork, Ireland | |
| 700 | 1 | |a Heinis, Thomas |u Department of Computing, Imperial College London, London, UK (ROR: https://ror.org/041kmwe10) (GRID: grid.7445.2) (ISNI: 0000 0001 2113 8111) | |
| 773 | 0 | |t Scientific Reports (Nature Publisher Group) |g vol. 15, no. 1 (2025), p. 39236-39248 | |
| 786 | 0 | |d ProQuest |t Science Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3270646425/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text |u https://www.proquest.com/docview/3270646425/fulltext/embedded/H09TXR3UUZB2ISDL?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3270646425/fulltextPDF/embedded/H09TXR3UUZB2ISDL?source=fedsrch |