Optimizing Data Movement Performance and Energy Efficiency in Distributed Systems Under Shared Resource Constraints
Guardado en:
| Publicado en: | ProQuest Dissertations and Theses (2025) |
|---|---|
| Autor principal: | |
| Publicado: |
ProQuest Dissertations & Theses
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3250259443 | ||
| 003 | UK-CbPIL | ||
| 020 | |a 9798293833016 | ||
| 035 | |a 3250259443 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 66569 |2 nlm | ||
| 100 | 1 | |a Jamil, Md. Hasibul | |
| 245 | 1 | |a Optimizing Data Movement Performance and Energy Efficiency in Distributed Systems Under Shared Resource Constraints | |
| 260 | |b ProQuest Dissertations & Theses |c 2025 | ||
| 513 | |a Dissertation/Thesis | ||
| 520 | 3 | |a The extensive growth in data-intensive science and industrial analytics has magnified the importance of achieving high-throughput and energy-efficient data movement over heterogeneous networks and compute environments. Existing solutions for data movement often rely on static, one-size-fits-all parameter configurations that cannot adapt to fluctuations in network bandwidth, end-system contention, or filesystem performance demands. Consequently, these approaches either fail to maximize throughput or incur substantial energy overheads.In our research, we present a family of novel solutions that jointly optimize data movement performance and energy consumption through cross-layer adaptations, spanning the application layer, kernel configurations, and runtime environments. First, we propose a two-phase decision-tree-based framework for uncertainty reduction to optimize throughput and energy efficiency in data transfer applications. Its offline component clusters historical data transfer logs to identify robust application and kernel parameters; subsequently, an online algorithm adapts concurrency, parallelism, CPU core allocation, and frequency scaling based on real-time conditions. This cross-layer solution demonstrates up to 117% higher throughput and 19% lower energy consumption compared to traditional methods.Recognizing the high cost of gathering environment-specific historical data and the need for a dedicated application-level solution for wide adaptability, we further introduce learning-based approaches that generalize across diverse network conditions without relying on extensive prior historical logs. By incorporating Deep Reinforcement Learning (DRL) and multi-parameter optimization, these frameworks dynamically adjust the number of parallel TCP streams and application-layer concurrency, yielding up to 25% throughput gains and 40% energy savings while converging 40% faster than conventional algorithms. Fairness and congestion avoidance mechanisms are also integrated to maintain stable network performance across competing flows.Building on these cross-layer, energy-aware principles, we then apply a similar concept to distributed machine learning I/O with efficient machine learning I/O (EMLIO). EMLIO co-locates lightweight daemons on storage nodes to pre-batch and serialize data shards from training data, move data over multi-stream TCP/ZeroMQ channels, and integrates seamlessly with GPU-accelerated preprocessing (e.g., NVIDIA DALI). In our evaluations, EMLIO delivers up to 8.6x faster I/O and 10.9x lower energy consumption compared to state-of-the-art ML loaders, while maintaining constant performance and energy profiles irrespective of network distance.Beyond bulk data transfers, we investigate end-to-end scientific data streaming under near-real-time constraints. Our NUMA-aware runtime system aligns memory-intensive tasks (e.g., compression) with local memory domains, thereby delivering up to a 1.48x throughput improvement over state-of-the-art methods and a 2.6x speedup over conventional approaches. We also develop FlowTracer, a tool to detect and correct imbalances in equal cost multi-path (ECMP) routing within leaf-spine networks, reducing path skew by 30% and alleviating throughput degradation specifically targeted for AI training workloads.Collectively, these contributions lay a robust groundwork for multi-objective optimization of data movement and distributed training in shared environments. By unifying cross-layer decision-tree methods, reinforcement-learning policies, energy-aware I/O services, NUMA-aware runtime designs, and multi-path route monitoring tools, significantly enhance through-put, reduce energy costs, and maintain fairness in large-scale, heterogeneous workloads. | |
| 653 | |a Computer engineering | ||
| 653 | |a Computer science | ||
| 653 | |a Electrical engineering | ||
| 773 | 0 | |t ProQuest Dissertations and Theses |g (2025) | |
| 786 | 0 | |d ProQuest |t ProQuest Dissertations & Theses Global | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3250259443/abstract/embedded/CH9WPLCLQHQD1J4S?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3250259443/fulltextPDF/embedded/CH9WPLCLQHQD1J4S?source=fedsrch |