Performance Debugging through Microarchitectural Sensitivity and Causality Analysis
I tiakina i:
| I whakaputaina i: | arXiv.org (Dec 3, 2024), p. n/a |
|---|---|
| Kaituhi matua: | |
| Ētahi atu kaituhi: | , , , , , |
| I whakaputaina: |
Cornell University Library, arXiv.org
|
| Ngā marau: | |
| Urunga tuihono: | Citation/Abstract Full text outside of ProQuest |
| Ngā Tūtohu: |
Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3147264633 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2331-8422 | ||
| 035 | |a 3147264633 | ||
| 045 | 0 | |b d20241203 | |
| 100 | 1 | |a Dutilleul, Alban | |
| 245 | 1 | |a Performance Debugging through Microarchitectural Sensitivity and Causality Analysis | |
| 260 | |b Cornell University Library, arXiv.org |c Dec 3, 2024 | ||
| 513 | |a Working Paper | ||
| 520 | 3 | |a Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to fully exploit the performance offered by hardware resources. Current performance debugging approaches rely either on measuring resource utilization, in order to estimate which parts of a CPU induce performance limitations, or on code-based analysis deriving bottleneck information from capacity/throughput models. These approaches are limited by instrumental and methodological precision, present portability constraints across different microarchitectures, and often offer factual information about resource constraints, but not causal hints about how to solve them. This paper presents a novel performance debugging and analysis tool that implements a resource-centric CPU model driven by dynamic binary instrumentation that is capable of detecting complex bottlenecks caused by an interplay of hardware and software factors. Bottlenecks are detected through sensitivity-based analysis, a sort of model parameterization that uses differential analysis to reveal constrained resources. It also implements a new technique we developed that we call causality analysis, that propagates constraints to pinpoint how each instruction contribute to the overall execution time. To evaluate our analysis tool, we considered the set of high-performance computing kernels obtained by applying a wide range of transformations from the Polybench benchmark suite and measured the precision on a few Intel CPU and Arm micro-architectures. We also took one of the benchmarks (correlation) as an illustrative example to illustrate how our tool's bottleneck analysis can be used to optimize a code. | |
| 653 | |a Central processing units--CPUs | ||
| 653 | |a Parameterization | ||
| 653 | |a Performance evaluation | ||
| 653 | |a Computer architecture | ||
| 653 | |a Sensitivity analysis | ||
| 653 | |a Hardware | ||
| 653 | |a Debugging | ||
| 653 | |a Parameter sensitivity | ||
| 653 | |a Task complexity | ||
| 653 | |a Complex systems | ||
| 653 | |a Resource utilization | ||
| 653 | |a Constraints | ||
| 653 | |a Software | ||
| 653 | |a Benchmarks | ||
| 653 | |a Bottlenecks | ||
| 700 | 1 | |a Pompougnac, Hugo | |
| 700 | 1 | |a Derumigny, Nicolas | |
| 700 | 1 | |a Rodriguez, Gabriel | |
| 700 | 1 | |a Trophime, Valentin | |
| 700 | 1 | |a Guillon, Christophe | |
| 700 | 1 | |a Rastello, Fabrice | |
| 773 | 0 | |t arXiv.org |g (Dec 3, 2024), p. n/a | |
| 786 | 0 | |d ProQuest |t Engineering Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3147264633/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch |
| 856 | 4 | 0 | |3 Full text outside of ProQuest |u http://arxiv.org/abs/2412.13207 |