Performance Debugging through Microarchitectural Sensitivity and Causality Analysis

I tiakina i:
Ngā taipitopito rārangi puna kōrero
I whakaputaina i:arXiv.org (Dec 3, 2024), p. n/a
Kaituhi matua: Dutilleul, Alban
Ētahi atu kaituhi: Pompougnac, Hugo, Derumigny, Nicolas, Rodriguez, Gabriel, Trophime, Valentin, Guillon, Christophe, Rastello, Fabrice
I whakaputaina:
Cornell University Library, arXiv.org
Ngā marau:
Urunga tuihono:Citation/Abstract
Full text outside of ProQuest
Ngā Tūtohu: Tāpirihia he Tūtohu
Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!

MARC

LEADER 00000nab a2200000uu 4500
001 3147264633
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3147264633 
045 0 |b d20241203 
100 1 |a Dutilleul, Alban 
245 1 |a Performance Debugging through Microarchitectural Sensitivity and Causality Analysis 
260 |b Cornell University Library, arXiv.org  |c Dec 3, 2024 
513 |a Working Paper 
520 3 |a Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to fully exploit the performance offered by hardware resources. Current performance debugging approaches rely either on measuring resource utilization, in order to estimate which parts of a CPU induce performance limitations, or on code-based analysis deriving bottleneck information from capacity/throughput models. These approaches are limited by instrumental and methodological precision, present portability constraints across different microarchitectures, and often offer factual information about resource constraints, but not causal hints about how to solve them. This paper presents a novel performance debugging and analysis tool that implements a resource-centric CPU model driven by dynamic binary instrumentation that is capable of detecting complex bottlenecks caused by an interplay of hardware and software factors. Bottlenecks are detected through sensitivity-based analysis, a sort of model parameterization that uses differential analysis to reveal constrained resources. It also implements a new technique we developed that we call causality analysis, that propagates constraints to pinpoint how each instruction contribute to the overall execution time. To evaluate our analysis tool, we considered the set of high-performance computing kernels obtained by applying a wide range of transformations from the Polybench benchmark suite and measured the precision on a few Intel CPU and Arm micro-architectures. We also took one of the benchmarks (correlation) as an illustrative example to illustrate how our tool's bottleneck analysis can be used to optimize a code. 
653 |a Central processing units--CPUs 
653 |a Parameterization 
653 |a Performance evaluation 
653 |a Computer architecture 
653 |a Sensitivity analysis 
653 |a Hardware 
653 |a Debugging 
653 |a Parameter sensitivity 
653 |a Task complexity 
653 |a Complex systems 
653 |a Resource utilization 
653 |a Constraints 
653 |a Software 
653 |a Benchmarks 
653 |a Bottlenecks 
700 1 |a Pompougnac, Hugo 
700 1 |a Derumigny, Nicolas 
700 1 |a Rodriguez, Gabriel 
700 1 |a Trophime, Valentin 
700 1 |a Guillon, Christophe 
700 1 |a Rastello, Fabrice 
773 0 |t arXiv.org  |g (Dec 3, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3147264633/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.13207