SimCost: cost-effective resource provision prediction and recommendation for spark workloads

محفوظ في:
التفاصيل البيبلوغرافية
الحاوية / القاعدة:Distributed and Parallel Databases vol. 42, no. 1 (Mar 2024), p. 73
المؤلف الرئيسي: Chen, Yuxing
مؤلفون آخرون: Hoque, Mohammad A., Xu, Pengfei, Lu, Jiaheng, Tarkoma, Sasu
منشور في:
Springer Nature B.V.
الموضوعات:
الوصول للمادة أونلاين:Citation/Abstract
Full Text
Full Text - PDF
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
مستخلص:Spark is one of the most popular big data analytical platforms. To save time, achieve high resource utilization, and remain cost-effective for Spark jobs, it is challenging but imperative for data scientists to configure suitable resource portions.In this paper, we investigate the proper parameter values that meet workloads’ performance requirements with minimized resource cost and resource utilization time. We propose SimCost, a simulation-based cost model, to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. Our method’s salient feature is that it allows us to invest low training costs while obtaining an accurate prediction. Through empirical experiments with 12 benchmark workloads, we show that the cost model yields less than 5% error on average prediction accuracy, and the recommendation achieves up to 6x resource cost saving.
تدمد:0926-8782
1573-7578
DOI:10.1007/s10619-023-07436-y
المصدر:Advanced Technologies & Aerospace Database