SimCost: cost-effective resource provision prediction and recommendation for spark workloads

Guardado en:
Detalles Bibliográficos
Publicado en:Distributed and Parallel Databases vol. 42, no. 1 (Mar 2024), p. 73
Autor principal: Chen, Yuxing
Otros Autores: Hoque, Mohammad A., Xu, Pengfei, Lu, Jiaheng, Tarkoma, Sasu
Publicado:
Springer Nature B.V.
Materias:
Acceso en línea:Citation/Abstract
Full Text
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3255419673
003 UK-CbPIL
022 |a 0926-8782 
022 |a 1573-7578 
024 7 |a 10.1007/s10619-023-07436-y  |2 doi 
035 |a 3255419673 
045 2 |b d20240301  |b d20240331 
100 1 |a Chen, Yuxing  |u Univeristy of Helsinki, Department of Computer Science, Helsinki, Finland (GRID:grid.7737.4) (ISNI:0000 0004 0410 2071) 
245 1 |a SimCost: cost-effective resource provision prediction and recommendation for spark workloads 
260 |b Springer Nature B.V.  |c Mar 2024 
513 |a Journal Article 
520 3 |a Spark is one of the most popular big data analytical platforms. To save time, achieve high resource utilization, and remain cost-effective for Spark jobs, it is challenging but imperative for data scientists to configure suitable resource portions.In this paper, we investigate the proper parameter values that meet workloads’ performance requirements with minimized resource cost and resource utilization time. We propose SimCost, a simulation-based cost model, to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. Our method’s salient feature is that it allows us to invest low training costs while obtaining an accurate prediction. Through empirical experiments with 12 benchmark workloads, we show that the cost model yields less than 5% error on average prediction accuracy, and the recommendation achieves up to 6x resource cost saving. 
653 |a Workload 
653 |a Big Data 
653 |a Machine learning 
653 |a Simulation 
653 |a Algorithms 
653 |a Resource utilization 
653 |a Salience 
653 |a Cloud computing 
653 |a Workloads 
653 |a Monte Carlo simulation 
653 |a Cost effectiveness 
700 1 |a Hoque, Mohammad A.  |u Univeristy of Helsinki, Department of Computer Science, Helsinki, Finland (GRID:grid.7737.4) (ISNI:0000 0004 0410 2071) 
700 1 |a Xu, Pengfei  |u Univeristy of Helsinki, Department of Computer Science, Helsinki, Finland (GRID:grid.7737.4) (ISNI:0000 0004 0410 2071) 
700 1 |a Lu, Jiaheng  |u Univeristy of Helsinki, Department of Computer Science, Helsinki, Finland (GRID:grid.7737.4) (ISNI:0000 0004 0410 2071) 
700 1 |a Tarkoma, Sasu  |u Univeristy of Helsinki, Department of Computer Science, Helsinki, Finland (GRID:grid.7737.4) (ISNI:0000 0004 0410 2071) 
773 0 |t Distributed and Parallel Databases  |g vol. 42, no. 1 (Mar 2024), p. 73 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3255419673/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3255419673/fulltext/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3255419673/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch