SamQL: a structured query language and filtering tool for the SAM/BAM file format

Sparad:
Bibliografiska uppgifter
I publikationen:BMC Bioinformatics vol. 22 (2021), p. 1
Huvudupphov: Lee, Christopher T
Övriga upphov: Maragkakis, Manolis
Utgiven:
Springer Nature B.V.
Ämnen:
Länkar:Citation/Abstract
Full Text
Full Text - PDF
Taggar: Lägg till en tagg
Inga taggar, Lägg till första taggen!

MARC

LEADER 00000nab a2200000uu 4500
001 2582941463
003 UK-CbPIL
022 |a 1471-2105 
024 7 |a 10.1186/s12859-021-04390-3  |2 doi 
035 |a 2582941463 
045 2 |b d20210101  |b d20211231 
084 |a 58459  |2 nlm 
100 1 |a Lee, Christopher T 
245 1 |a SamQL: a structured query language and filtering tool for the SAM/BAM file format 
260 |b Springer Nature B.V.  |c 2021 
513 |a Journal Article 
520 3 |a Background The Sequence Alignment/Map Format Specification (SAM) is one of the most widely adopted file formats in bioinformatics and many researchers use it daily. Several tools, including most high-throughput sequencing read aligners, use it as their primary output and many more tools have been developed to process it. However, despite its flexibility, SAM encoded files can often be difficult to query and understand even for experienced bioinformaticians. As genomic data are rapidly growing, structured, and efficient queries on data that are encoded in SAM/BAM files are becoming increasingly important. Existing tools are very limited in their query capabilities or are not efficient. Critically, new tools that address these shortcomings, should not be able to support existing large datasets but should also do so without requiring massive data transformations and file infrastructure reorganizations. Results Here we introduce SamQL, an SQL-like query language for the SAM format with intuitive syntax that supports complex and efficient queries on top of SAM/BAM files and that can replace commonly used Bash one-liners employed by many bioinformaticians. SamQL has high expressive power with no upper limit on query size and when parallelized, outperforms other substantially less expressive software. Conclusions SamQL is a complete query language that we envision as a step to a structured database engine for genomics. SamQL is written in Go, and is freely available as standalone program and as an open-source library under an MIT license, https://github.com/maragkakislab/samql/. 
653 |a Language 
653 |a Software 
653 |a Linings 
653 |a Computer programs 
653 |a Source code 
653 |a Datasets 
653 |a Syntax 
653 |a Next-generation sequencing 
653 |a Bioinformatics 
653 |a Queries 
653 |a Tools 
653 |a Format 
653 |a Genomics 
653 |a Nucleotide sequence 
653 |a Structured Query Language-SQL 
653 |a Keywords 
653 |a Query languages 
653 |a Economic 
700 1 |a Maragkakis, Manolis 
773 0 |t BMC Bioinformatics  |g vol. 22 (2021), p. 1 
786 0 |d ProQuest  |t Health & Medical Collection 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2582941463/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/2582941463/fulltext/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/2582941463/fulltextPDF/embedded/ZKJTFFSVAI7CB62C?source=fedsrch