Unleashing Multi-GPU Computing to the Next-Level

Gardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor Principal: Li, Bingyao
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en liña:Citation/Abstract
Full Text - PDF
Etiquetas: Engadir etiqueta
Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!
Descripción
Resumo:In the past decade, Graphics Processing Units (GPUs) have rapidly evolved as one of the most popular computing platforms to provide significant acceleration in machine learning, graph processing, scientific computing, and VR/AR. The ever-growing application complexity and input dataset sizes have driven the popularity of multi-GPU systems as desirable computing platforms. This trend is also evident in modern computing infrastructures and data centers, e.g., nine of the top ten supercomputers are equipped with multiple GPUs per node. While employing multiple GPUs intuitively offers aggregated memory capacity and combined computational parallelism, these increased resources rarely translate to tangible application benefits (e.g., performance and quality of services). This discrepancy arises from several factors, such as inefficient address translation, non-uniform memory accesses, inter-GPU communication overheads, and load imbalance among the GPUs, etc. Consequently, critical questions remain unaddressed: How to design multi-GPU computing architectures? and How to harness multi-GPU advantages in emerging applications?.This thesis is motivated by these two critical questions and aims to advance the deployment of multi-GPU systems in modern computing. The thesis pioneered several distinctive directions of architectural and system-level designs toward fully exploiting multi-GPU capabilities. First, the thesis redesigns the TLB hierarchy and proposes i) “least-inclusive” TLB hierarchy and ii) hardware-supported address translation sharing with peer GPUs. Second, the thesis focuses on uncovering the bottlenecks and exploring opportunities in page table walking (PTW) in multi-GPUs. Third, the thesis investigates the effects of frequent page migration invalidations in multi-GPU systems and proposes a software-hardware co-design to mitigate the page table invalidation overhead and improve overall application performance. Finally, in multi-tenant environments, TLB sub-entries are often underutilized due to multi-tenancy interference. The thesis proposes shared-aware sub-entry technique to enhance utilization.
ISBN:9798293818518
Fonte:ProQuest Dissertations & Theses Global