Toward Improving Productivity, Cost Effectiveness, and Sustainability of Large Scale Computing Systems
Guardado en:
| Publicado en: | ProQuest Dissertations and Theses (2025) |
|---|---|
| Autor principal: | |
| Publicado: |
ProQuest Dissertations & Theses
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| Resumen: | High performance computing (HPC) and cloud datacenters are observing unprecedented increases in capacity and demand from users for more computing resources. A modern datacenter can host up to 100,000 hardware compute nodes, may cost up to $1 billion to build, and can consume over 300 gigawatt hours (GWh) of energy annually, resulting in a carbon footprint of up to 150,000 metric tons of carbon dioxide (comparable to the emissions of more than 20,000 U.S. households). However, a programmer's ability to use complex datacenter hardware efficiently for achieving high performance for their applications has not scaled up proportionally -- in fact, some pessimistically suspect it may have become worse with increasing hardware complexity. Therefore, this dissertation poses a fundamental question (and goal): "Can we build system software tools to make large-scale computing systems more productive for programmers, cost-effective for service providers, and environmentally more sustainable?" The dissertation designs and implements innovative real-system experimental proof-of-concepts to demonstrate that a set of elegant, and sometimes non-intuitive, strategies enable us to achieve this challenging goal. We demonstrate that intelligently leveraging serverless cloud computing (function-as-a-service) can make the execution of complex scientific workflows more resource-efficient and faster -- in contrast to the conventional practice of executing scientific workflows on on-premise, stateful HPC clusters instead of stateless, function-as-a-service execution on cloud platforms. However, leveraging serverless cloud computing model and cloud computing resources is cost-prohibitive, difficult to optimize for performance, and poses a severe ``hidden'' carbon footprint burden. This dissertation demonstrates that the unorthodox use of server heterogeneity (low-end and high-end hardware) and opportunistic scheduling can make serverless computing significantly more cost-effective. Toward lowering the productivity burden on programmers for cost-effective performance optimization, this dissertation demonstrates that an ensemble of lightweight, approximately accurate performance models and tuning methods can be more effective than building accurate and highly complex performance models and performance tuning strategies. Finally, this dissertation proposes the first carbon footprint accounting methodology and server-heterogeneity-inspired mitigation strategy for the serverless computing model -- revealing and reducing the high hidden embodied carbon footprint of keeping function code alive in server memory in anticipation of future invocations. We are hopeful that real-system open-source artifacts will accelerate innovation in this area and broader community engagement toward more productive, cost-effective, and sustainable HPC systems. |
|---|---|
| ISBN: | 9798314844472 |
| Fuente: | ProQuest Dissertations & Theses Global |