Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory

שמור ב:
מידע ביבליוגרפי
הוצא לאור ב:Journal of Computer Science and Technology vol. 34, no. 1 (Jan 2019), p. 94
מחבר ראשי: Yang, Hong
מחברים אחרים: Yang, Zheng, Yang, Fan, Bin-Yu, Zang, Hai-Bing Guan, Hai-Bo Chen
יצא לאור:
Springer Nature B.V.
נושאים:
גישה מקוונת:Citation/Abstract
Full Text - PDF
תגים: הוספת תג
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

MARC

LEADER 00000nab a2200000uu 4500
001 2171314603
003 UK-CbPIL
022 |a 1000-9000 
022 |a 1860-4749 
024 7 |a 10.1007/s11390-019-1901-4  |2 doi 
035 |a 2171314603 
045 2 |b d20190101  |b d20190131 
084 |a 137755  |2 nlm 
100 1 |a Yang, Hong  |u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China 
245 1 |a Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory 
260 |b Springer Nature B.V.  |c Jan 2019 
513 |a Journal Article 
520 3 |a The multicore evolution has stimulated renewed interests in scaling up applications on shared-memory multiprocessors, significantly improving the scalability of many applications. But the scalability is limited within a single node; therefore programmers still have to redesign applications to scale out over multiple nodes. This paper revisits the design and implementation of distributed shared memory (DSM) as a way to scale out applications optimized for non-uniform memory access (NUMA) architecture over a well-connected cluster. This paper presents MAGI, an efficient DSM system that provides a transparent shared address space with scalable performance on a cluster with fast network interfaces. MAGI is unique in that it presents a NUMA abstraction to fully harness the multicore resources in each node through hierarchical synchronization and memory management. MAGI also exploits the memory access patterns of big-data applications and leverages a set of optimizations for remote direct memory access (RDMA) to reduce the number of page faults and the cost of the coherence protocol. MAGI has been implemented as a user-space library with pthread-compatible interfaces and can run existing multithreaded applications with minimized modifications. We deployed MAGI over an 8-node RDMAenabled cluster. Experimental evaluation shows that MAGI achieves up to 9.25x speedup compared with an unoptimized implementation, leading to a scalable performance for large-scale data-intensive applications. 
653 |a Data management 
653 |a Synchronism 
653 |a Clusters 
653 |a Resource management 
653 |a Distributed shared memory 
653 |a Servers 
653 |a Distributed memory 
653 |a Redesign 
653 |a Memory management 
653 |a Nodes 
700 1 |a Yang, Zheng  |u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China 
700 1 |a Yang, Fan  |u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China 
700 1 |a Bin-Yu, Zang  |u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China 
700 1 |a Hai-Bing Guan  |u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China 
700 1 |a Hai-Bo Chen  |u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China 
773 0 |t Journal of Computer Science and Technology  |g vol. 34, no. 1 (Jan 2019), p. 94 
786 0 |d ProQuest  |t ABI/INFORM Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2171314603/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/2171314603/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch