Performance models of data parallel DAG workflows for large scale data analytics

Guardat en:
Dades bibliogràfiques
Publicat a:Distributed and Parallel Databases vol. 41, no. 3 (Sep 2023), p. 299
Autor principal: Shi, Juwei
Altres autors: Lu, Jiaheng
Publicat:
Springer Nature B.V.
Matèries:
Accés en línia:Citation/Abstract
Full Text
Full Text - PDF
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3255421070
003 UK-CbPIL
022 |a 0926-8782 
022 |a 1573-7578 
024 7 |a 10.1007/s10619-023-07425-1  |2 doi 
035 |a 3255421070 
045 2 |b d20230901  |b d20230930 
100 1 |a Shi, Juwei  |u Microsoft STCA, Beijing, China 
245 1 |a Performance models of data parallel DAG workflows for large scale data analytics 
260 |b Springer Nature B.V.  |c Sep 2023 
513 |a Journal Article 
520 3 |a Directed Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-based distributed computing systems. The performance model for a DAG on data-parallel frameworks (e.g., MapReduce) is a research challenge because the allocation of preemptable system resources among parallel jobs may dynamically vary during execution. This resource allocation variation during execution makes it difficult to accurately estimate the execution time. In this paper, we tackle this challenge by proposing a new cost model, called Bottleneck Oriented Estimation (BOE), to estimate the allocation of preemptable resources by identifying the bottleneck to accurately predict task execution time. For a DAG workflow, we propose a state-based approach to iteratively use the resource allocation property among stages to estimate the overall execution plan. Furthermore, to handle the skewness of various jobs, we refine the model with the order statistics theory to improve estimation accuracy. Extensive experiments were performed to validate these cost models with HiBench and TPC-H workloads. The BOE model outperforms the state-of-the-art models by a factor of five for task execution time estimation. For the refined skew-aware model, the average prediction error is under 3%<inline-graphic specific-use="web" mime-subtype="GIF" xlink:href="10619_2023_7425_Article_IEq1.gif" /> when estimating the execution time of 51 hybrid analytics (HiBench) and query (TPC-H) DAG workflows. 
653 |a Skewness 
653 |a Workloads 
653 |a Estimates 
653 |a Distributed processing 
653 |a Estimation 
653 |a Resource allocation 
700 1 |a Lu, Jiaheng  |u University of Helsinki, Helsinki, Finland (GRID:grid.7737.4) (ISNI:0000 0004 0410 2071) 
773 0 |t Distributed and Parallel Databases  |g vol. 41, no. 3 (Sep 2023), p. 299 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3255421070/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3255421070/fulltext/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3255421070/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch