Fast noisy long read alignment with multi-level parallelism

I tiakina i:

Ngā taipitopito rārangi puna kōrero
I whakaputaina i:	BMC Bioinformatics vol. 26 (2025), p. 1
Kaituhi matua:	Xia, Zeyu
Ētahi atu kaituhi:	Yang, Canqun, Peng, Chenchen, Guo, Yifei, Guo, Yufei, Tang, Tao, Cui, Yingbo
I whakaputaina:	Springer Nature B.V.
Ngā marau:	Background noise Parallel processing DNA sequencing Dynamic programming Alignment Performance evaluation Algorithms Communication Redesign Seeds Effectiveness Regions Genomes Computer applications Efficiency Economic
Urunga tuihono:	Citation/Abstract Full Text Full Text - PDF
Ngā Tūtohu:	Tāpirihia he Tūtohu Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!

Whakaahuatanga
Whakarāpopotonga:	BackgroundThe advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignment tools inapplicable. Additionally, a single CPU’s performance bottleneck restricts the effectiveness of alignment algorithms for SMRT sequencing.ResultsTo address these challenges, we introduce ParaHAT, a parallel alignment algorithm for noisy long reads. ParaHAT utilizes vector-level, thread-level, process-level, and heterogeneous parallelism. We redesign the dynamic programming matrices layouts to eliminate data dependency in the base-level alignment, enabling effective vectorization. We further enhance computational speed through heterogeneous parallel technology and implement the algorithm for multi-node computing using MPI, overcoming the computational limits of a single node.ConclusionsPerformance evaluations show that ParaHAT got a 10.03x speedup in base-level alignment, with a parallel acceleration ratio and weak scalability metric of 94.61 and 98.98% on 128 nodes, respectively.
ISSN:	1471-2105
DOI:	10.1186/s12859-025-06129-w
Puna:	Health & Medical Collection