SamQL: a structured query language and filtering tool for the SAM/BAM file format
Sparad:
| I publikationen: | BMC Bioinformatics vol. 22 (2021), p. 1 |
|---|---|
| Huvudupphov: | |
| Övriga upphov: | |
| Utgiven: |
Springer Nature B.V.
|
| Ämnen: | |
| Länkar: | Citation/Abstract Full Text Full Text - PDF |
| Taggar: |
Inga taggar, Lägg till första taggen!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 2582941463 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 1471-2105 | ||
| 024 | 7 | |a 10.1186/s12859-021-04390-3 |2 doi | |
| 035 | |a 2582941463 | ||
| 045 | 2 | |b d20210101 |b d20211231 | |
| 084 | |a 58459 |2 nlm | ||
| 100 | 1 | |a Lee, Christopher T | |
| 245 | 1 | |a SamQL: a structured query language and filtering tool for the SAM/BAM file format | |
| 260 | |b Springer Nature B.V. |c 2021 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a Background The Sequence Alignment/Map Format Specification (SAM) is one of the most widely adopted file formats in bioinformatics and many researchers use it daily. Several tools, including most high-throughput sequencing read aligners, use it as their primary output and many more tools have been developed to process it. However, despite its flexibility, SAM encoded files can often be difficult to query and understand even for experienced bioinformaticians. As genomic data are rapidly growing, structured, and efficient queries on data that are encoded in SAM/BAM files are becoming increasingly important. Existing tools are very limited in their query capabilities or are not efficient. Critically, new tools that address these shortcomings, should not be able to support existing large datasets but should also do so without requiring massive data transformations and file infrastructure reorganizations. Results Here we introduce SamQL, an SQL-like query language for the SAM format with intuitive syntax that supports complex and efficient queries on top of SAM/BAM files and that can replace commonly used Bash one-liners employed by many bioinformaticians. SamQL has high expressive power with no upper limit on query size and when parallelized, outperforms other substantially less expressive software. Conclusions SamQL is a complete query language that we envision as a step to a structured database engine for genomics. SamQL is written in Go, and is freely available as standalone program and as an open-source library under an MIT license, https://github.com/maragkakislab/samql/. | |
| 653 | |a Language | ||
| 653 | |a Software | ||
| 653 | |a Linings | ||
| 653 | |a Computer programs | ||
| 653 | |a Source code | ||
| 653 | |a Datasets | ||
| 653 | |a Syntax | ||
| 653 | |a Next-generation sequencing | ||
| 653 | |a Bioinformatics | ||
| 653 | |a Queries | ||
| 653 | |a Tools | ||
| 653 | |a Format | ||
| 653 | |a Genomics | ||
| 653 | |a Nucleotide sequence | ||
| 653 | |a Structured Query Language-SQL | ||
| 653 | |a Keywords | ||
| 653 | |a Query languages | ||
| 653 | |a Economic | ||
| 700 | 1 | |a Maragkakis, Manolis | |
| 773 | 0 | |t BMC Bioinformatics |g vol. 22 (2021), p. 1 | |
| 786 | 0 | |d ProQuest |t Health & Medical Collection | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/2582941463/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text |u https://www.proquest.com/docview/2582941463/fulltext/embedded/ZKJTFFSVAI7CB62C?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/2582941463/fulltextPDF/embedded/ZKJTFFSVAI7CB62C?source=fedsrch |