Asynchronous Many-Task System With Intelligent Scheduling

保存先:
書誌詳細
出版年:ProQuest Dissertations and Theses (2025)
第一著者: Chiu, Cheng-Hsiang
出版事項:
ProQuest Dissertations & Theses
主題:
オンライン・アクセス:Citation/Abstract
Full Text - PDF
タグ: タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!

MARC

LEADER 00000nab a2200000uu 4500
001 3243773989
003 UK-CbPIL
020 |a 9798291560310 
035 |a 3243773989 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Chiu, Cheng-Hsiang 
245 1 |a Asynchronous Many-Task System With Intelligent Scheduling 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a This thesis addresses critical challenges in the discipline of parallel computing and task scheduling, focusing on task-parallel programming and task graph scheduling. There are six chapters in the thesis. The first three chapters are task-parallel programming models, Pipeflow, token dependency-aware Pipeflow, and AsyncTask. The last three chapters are reinforcement learning-based task graph scheduling, resource-efficient task scheduling, topological ordering for task graphs, and CUDA Graph scheduling optimization.(1) An Efficient Task-parallel Pipeline Programming Framework. The pipeline is a fundamental pattern to parallelize a series of stage tasks over a sequence of data in loops. Mainstream libraries rely on data abstractions to schedule pipeline tasks, which complicates the scheduling design and is not efficient for applications with task parallelism only. To address this challenge, Pipeflow decouples task scheduling and data abstractions and introduces a lightweight scheduling policy to efficiently exploit pipeline parallelism in an application. We have demonstrated that Piepflow outperforms existing libraries up to 110.33% faster. This work significantly reduces the runtime, providing a crucial solution for pipeline applications that only exploit task parallelism.In this work, Cheng-Hsiang Chiu was the primary contributor, responsible for the majority of the research and development efforts. Zhicheng Xiong and Zizheng Guo provided design ideas. Yibo Lin and Tsung-Wei Huang supervised the research, providing guidance and oversight throughout the work. All authors participated in discussing the results and contributed to the preparation and review of this chapter.(2) A Task-parallel Pipeline Programming Model with Token Dependency. Task-parallel pipeline framework explores pipeline parallelism in applications and is critical in many parallel and heterogeneous areas. However, existing solutions cannot deal with applications in which data dependencies exhibit in both forward and backward directions To address this need, we have extended Pipeflow to support application’s bi-directional token dependencies through an expressive programming model and lightweight atomic counters in resolving token dependencies. We have demonstrated our token dependency-aware Pipeflow is 8.6% faster than existing implementation in video encoding applications. This work showcases Pipeflow’s capability to explore pipeline parallelism across various pipeline applications.In this work, Cheng-Hsiang Chiu was the primary contributor, responsible for the majority of the research and development efforts. Tsung-Wei Huang supervised the research, providing guidance and oversight throughout the work. Wan-Luan Lee, Boyang Zhang, YiHua Chung, and Che Chang provided design ideas. All authors participated in discussing the results and contributed to the preparation and review of this chapter.(3) Programming Dynamic Task Parallelism for Heterogeneous EDA Algorithms. Parallelizing EDA applications that are extremely sparse, irregular, and control-flow intensive can benefit from the ability to express dynamic task parallelism across arbitrary decision-making points at runtime. However, existing libraries describe dynamic task dependencies in an indirect manner and rely on lock-based data structure to schedule tasks. To address this challenge, we have introduced AsyncTask to support the dynamic building of a computational task graph. We have demonstrated AsyncTask is up to 3.19× faster than existing libraries. This work presents a direct description of tasks, improves code readability, and develops an efficient scheduling al-gorithm, which is extremely crucial for complex and irregular EDA applications.In this work, Cheng-Hsiang Chiu was the primary contributor, responsible for the majority of the research and development efforts. Dian-Lun Lin provided design ideas. Tsung-Wei Huang supervised the research, providing guidance and oversight throughout the work. All authors participated in discussing the results and contributed to the preparation and review of this chapter.(4) A Resource-efficient Task Scheduling System using Reinforcement Learning. Efficiently scheduling millions of functional tasks of EDA applications in a computing environment that comprises manycore CPUs and GPUs is critically important. However, existing scheduling methods are typically hardcoded within an application that are not adaptive to the change of computing environment. To address the issue, we have introduced a novel reinforcement learning-based scheduling algorithm that can learn to adapt the performance optimization to a given runtime situation. We have demonstrated that our scheduling algorithm can achieve the same performance as the existing methods while using only 20% of CPU resources. This work highlights the capability of our algorithm to maintain the same runtime performance across concurrent workloads without experiencing performance degradation, which is crucial for EDA applications.In this work, Cheng-Hsiang Chiu and Chedi Morchdi were both the primary contributors, responsible for the majority of the research and development efforts. Yi Zhou and Tsung-Wei Huang supervised the research, providing guidance and oversight throughout the work. All authors participated in discussing the results and contributed to the preparation and review of this chapter.(5) Reinforcement Learning-generated Topological Order for Dynamic Task Graph Scheduling. Dynamic task graph scheduling (DTGS) allows applications to define the task graph structure on-the-fly, enabling concurrent task creations and task executions. To schedule tasks, DTGS relies on applications to define a topological order for the task graph. However, existing algorithms that generates this order primarily rely on heuristics like level-by-level sorting, which lack adaptability to dynamic computing environments. To address this need, we have introduced a novel method that leverages reinforcement learning to generate topological orders for DTGS systems. We have demonstrated that our method achieves a speedup of up to 1.52× over the existing solutions. This work is essential for task-parallel runtimes that employ diverse work stealing policies to support a broader range of applications.In this work, Cheng-Hsiang Chiu was the primary contributor, responsible for the majority of the research and development efforts. Chedi Morchdi, Boyang Zhang and Che Chang provided design ideas. Yi Zhou and Tsung-Wei Huang supervised the research, providing guidance and oversight throughout the work. All authors participated in discussing the results and contributed to the preparation and review of this chapter.(6) Optimizing CUDA Graph Scheduling with Reinforcement Learning : A Case Study in SSTA Propagation.CUDA Graph has shown potential in recent GPU-accelerated statistical static timing analysis (SSTA) propagation applications. However, application-given CUDA graphs are often suboptimal, as they focus on capturing circuit structures while overlooking GPU resource availability and scheduling constraints. To address this challenge, we have introduced a reinforcement learning-based framework that optimizes CUDA graphs by learning to restructure SSTA graphs through inter-actions with the CUDA Graph runtime. We have demonstrated that our optimized CUDA graph can achieve up to a 12% runtime improvement over the application-given CUDA graph. This work is crucially important for CUDA Graph applications as our framework requires no changes to application-level algorithms, but instead restructures the given CUDA graph to guide the CUDA runtime toward better scheduling performance.In this work, Cheng-Hsiang Chiu was the primary contributor, responsible for the majority of the research and development efforts. Chih-Chun Chang designed the problem formulation. Chedi Morchdi provided design ideas. Cunxi Yu, Yi Zhou and Tsung-Wei Huang supervised the research, providing guidance and oversight throughout the work. All authors participated in discussing the results and contributed to the preparation and review of this chapter. 
653 |a Computer engineering 
653 |a Computer science 
653 |a Electrical engineering 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3243773989/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3243773989/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch