Shared memory optimizations for distributed memory programming models

Guardat en:

Dades bibliogràfiques
Publicat a:	ProQuest Dissertations and Theses (2013)
Autor principal:	Friedley, Andrew
Publicat:	ProQuest Dissertations & Theses
Matèries:	Computer science
Accés en línia:	Citation/Abstract Full Text - PDF
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC


LEADER	00000nab a2200000uu 4500
001	1494127403
003	UK-CbPIL
020			\|a 978-1-303-65920-1
035			\|a 1494127403
045	0		\|b d20130101
084			\|a 66569 \|2 nlm
100	1		\|a Friedley, Andrew
245	1		\|a Shared memory optimizations for distributed memory programming models
260			\|b ProQuest Dissertations & Theses \|c 2013
513			\|a Dissertation/Thesis
520	3		\|a In the world of parallel programming, there are two major classes of programming models: shared memory and distributed memory. Shared memory models share all memory by default, and are most effective on multi-processor systems. Distributed memory models separate memory into distinct regions for each execution context and are most effective on a network of processors. Modern and future High Performance Computing (HPC) systems will contain multi- and many-core processors connected by a network, resulting in a hybrid shared and distributed memory environment. Neither programming model is ideal in both areas. Now and in the future, optimizing parallel performance for both memory models simultaneously is a major challenge. MPI (Message Passing Interface) is the de-facto standard for distributed memory programming, but results in less than ideal performance when used in a shared memory environment. Message passing incurs overhead in the form of unnecessary data copying as well specific queuing, ordering, and matching rules. In this thesis, we will present a series of techniques that optimize MPI performance in a shared memory environment, thus helping to solve the challenge of optimizing parallel performance for both distributed and shared memory. We introduce the concept of a shared memory heap, in which dynamically allocated memory is shared by default on all MPI processes within a node. We then use that to transparently optimize message passing with two new data transfer protocols. Next, we propose an MPI extension for ownership passing, which eliminates data copying overheads completely. Instead of copying data, we transfer control (ownership) of communication buffers. Finally, we explore how shared memory techniques can be applied in the context of MPI and the shared memory heap. Loop fusion is a new technique for combining the packing and unpacking code on two different MPI ranks to eliminate explicit communication. All of these techniques are implemented in a freely available software library named Hybrid MPI (HMPI). We experimentally evaluate our work using a variety of micro-benchmarks and mini-applications. In the mini-applications, communication performance is improved up to 46% by our data transfer protocols, 54% by ownership passing, and 63% by loop fusion.
653			\|a Computer science
773	0		\|t ProQuest Dissertations and Theses \|g (2013)
786	0		\|d ProQuest \|t ProQuest Dissertations & Theses Global
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/1494127403/abstract/embedded/Y2VX53961LHR7RE6?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/1494127403/fulltextPDF/embedded/Y2VX53961LHR7RE6?source=fedsrch