Efficient Scheduling for GPU-Based Neural Network Training via Hybrid Reinforcement Learning and Metaheuristic Optimization
Guardado en:
| 发表在: | Big Data and Cognitive Computing vol. 9, no. 11 (2025), p. 284-325 |
|---|---|
| 主要作者: | |
| 其他作者: | , , , |
| 出版: |
MDPI AG
|
| 主题: | |
| 在线阅读: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| 标签: |
没有标签, 成为第一个标记此记录!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3275500653 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2504-2289 | ||
| 024 | 7 | |a 10.3390/bdcc9110284 |2 doi | |
| 035 | |a 3275500653 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 100 | 1 | |a Du, Nana |u School of Computer, Northwest University, Xi’an 710100, China; dunana@stumail.nwu.edu.cn (N.D.); ruiqi_song@stumail.nwu.edu.cn (R.S.) | |
| 245 | 1 | |a Efficient Scheduling for GPU-Based Neural Network Training via Hybrid Reinforcement Learning and Metaheuristic Optimization | |
| 260 | |b MDPI AG |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a On GPU-based clusters, the training workloads of machine learning (ML) models, particularly neural networks (NNs), are often structured as Directed Acyclic Graphs (DAGs) and typically deployed for parallel execution across heterogeneous GPU resources. Efficient scheduling of these workloads is crucial for optimizing performance metrics such as execution time, under various constraints including GPU heterogeneity, network capacity, and data dependencies. DAG-structured ML workload scheduling could be modeled as a Nonlinear Integer Program (NIP) problem, and is shown to be NP-complete. By leveraging a positive correlation between Scheduling Plan Distance (SPD) and Finish Time Gap (FTG) identified through an empirical study, we propose to develop a Running Time Gap Strategy for scheduling based on Whale Optimization Algorithm (WOA) and Reinforcement Learning, referred to as WORL-RTGS. The proposed method integrates the global search capabilities of WOA with the adaptive decision-making of Double Deep Q-Networks (DDQN). Particularly, we derive a novel function to generate effective scheduling plans using DDQN, enhancing adaptability to complex DAG structures. Comprehensive evaluations on practical ML workload traces collected from Alibaba on simulated GPU-enabled platforms demonstrate that WORL-RTGS significantly improves WOA’s stability for DAG-structured ML workload scheduling and reduces completion time by up to 66.56% compared with five state-of-the-art scheduling algorithms. | |
| 653 | |a Scheduling | ||
| 653 | |a Performance measurement | ||
| 653 | |a Neural networks | ||
| 653 | |a Integer programming | ||
| 653 | |a Costs | ||
| 653 | |a Graph theory | ||
| 653 | |a Optimization | ||
| 653 | |a Decision making | ||
| 653 | |a Workload | ||
| 653 | |a Algorithms | ||
| 653 | |a Quality of service | ||
| 653 | |a Machine learning | ||
| 653 | |a Heuristic | ||
| 653 | |a Completion time | ||
| 653 | |a Workloads | ||
| 653 | |a Heterogeneity | ||
| 653 | |a Heuristic methods | ||
| 653 | |a Run time (computers) | ||
| 700 | 1 | |a Wu, Chase |u Department of Data Science, New Jersey Institute of Technology, Newark, NJ 07102, USA; chase.wu@njit.edu | |
| 700 | 1 | |a Hou Aiqin |u School of Computer, Northwest University, Xi’an 710100, China; dunana@stumail.nwu.edu.cn (N.D.); ruiqi_song@stumail.nwu.edu.cn (R.S.) | |
| 700 | 1 | |a Nie Weike |u School of Computer, Northwest University, Xi’an 710100, China; dunana@stumail.nwu.edu.cn (N.D.); ruiqi_song@stumail.nwu.edu.cn (R.S.) | |
| 700 | 1 | |a Song Ruiqi |u School of Computer, Northwest University, Xi’an 710100, China; dunana@stumail.nwu.edu.cn (N.D.); ruiqi_song@stumail.nwu.edu.cn (R.S.) | |
| 773 | 0 | |t Big Data and Cognitive Computing |g vol. 9, no. 11 (2025), p. 284-325 | |
| 786 | 0 | |d ProQuest |t Advanced Technologies & Aerospace Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3275500653/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text + Graphics |u https://www.proquest.com/docview/3275500653/fulltextwithgraphics/embedded/6A8EOT78XXH2IG52?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3275500653/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch |