Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory

Guardado en:
Bibliografiske detaljer
Udgivet i:Journal of Computer Science and Technology vol. 34, no. 1 (Jan 2019), p. 94
Hovedforfatter: Hong, Yang
Andre forfattere: Zheng, Yang, Yang, Fan, Zang, Bin-Yu, Guan, Hai-Bing, Chen, Hai-Bo
Udgivet:
Springer Nature B.V.
Fag:
Online adgang:Citation/Abstract
Full Text - PDF
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!

MARC

LEADER 00000nab a2200000uu 4500
001 2918609996
003 UK-CbPIL
022 |a 1000-9000 
022 |a 1860-4749 
024 7 |a 10.1007/s11390-019-1901-4  |2 doi 
035 |a 2918609996 
045 2 |b d20190101  |b d20190131 
084 |a 137755  |2 nlm 
100 1 |a Hong, Yang  |u Shanghai Jiao Tong University, Shanghai Key Laboratory for Scalable Computing Systems, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293) 
245 1 |a Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory 
260 |b Springer Nature B.V.  |c Jan 2019 
513 |a Journal Article 
520 3 |a The multicore evolution has stimulated renewed interests in scaling up applications on shared-memory multiprocessors, significantly improving the scalability of many applications. But the scalability is limited within a single node; therefore programmers still have to redesign applications to scale out over multiple nodes. This paper revisits the design and implementation of distributed shared memory (DSM) as a way to scale out applications optimized for non-uniform memory access (NUMA) architecture over a well-connected cluster. This paper presents MAGI, an efficient DSM system that provides a transparent shared address space with scalable performance on a cluster with fast network interfaces. MAGI is unique in that it presents a NUMA abstraction to fully harness the multicore resources in each node through hierarchical synchronization and memory management. MAGI also exploits the memory access patterns of big-data applications and leverages a set of optimizations for remote direct memory access (RDMA) to reduce the number of page faults and the cost of the coherence protocol. MAGI has been implemented as a user-space library with pthread-compatible interfaces and can run existing multithreaded applications with minimized modifications. We deployed MAGI over an 8-node RDMAenabled cluster. Experimental evaluation shows that MAGI achieves up to 9.25x speedup compared with an unoptimized implementation, leading to a scalable performance for large-scale data-intensive applications. 
653 |a Synchronism 
653 |a Big Data 
653 |a Clusters 
653 |a Multiprocessing 
653 |a Distributed memory 
653 |a Redesign 
653 |a Memory management 
653 |a Nodes 
653 |a Design 
653 |a Software 
653 |a Information sharing 
653 |a Efficiency 
653 |a Commodities 
700 1 |a Zheng, Yang  |u Shanghai Jiao Tong University, Shanghai Key Laboratory for Scalable Computing Systems, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293) 
700 1 |a Yang, Fan  |u Shanghai Jiao Tong University, Shanghai Key Laboratory for Scalable Computing Systems, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293) 
700 1 |a Zang, Bin-Yu  |u Shanghai Jiao Tong University, Shanghai Key Laboratory for Scalable Computing Systems, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293) 
700 1 |a Guan, Hai-Bing  |u Shanghai Jiao Tong University, Shanghai Key Laboratory for Scalable Computing Systems, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293) 
700 1 |a Chen, Hai-Bo  |u Shanghai Jiao Tong University, Shanghai Key Laboratory for Scalable Computing Systems, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293) 
773 0 |t Journal of Computer Science and Technology  |g vol. 34, no. 1 (Jan 2019), p. 94 
786 0 |d ProQuest  |t ABI/INFORM Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2918609996/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/2918609996/fulltextPDF/embedded/H09TXR3UUZB2ISDL?source=fedsrch