Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory

שמור ב:

מידע ביבליוגרפי
הוצא לאור ב:	Journal of Computer Science and Technology vol. 34, no. 1 (Jan 2019), p. 94
מחבר ראשי:	Yang, Hong
מחברים אחרים:	Yang, Zheng, Yang, Fan, Bin-Yu, Zang, Hai-Bing Guan, Hai-Bo Chen
יצא לאור:	Springer Nature B.V.
נושאים:	Data management Synchronism Clusters Resource management Distributed shared memory Servers Distributed memory Redesign Memory management Nodes
גישה מקוונת:	Citation/Abstract Full Text - PDF
תגים:	הוספת תג אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

MARC


LEADER	00000nab a2200000uu 4500
001	2171314603
003	UK-CbPIL
022			\|a 1000-9000
022			\|a 1860-4749
024	7		\|a 10.1007/s11390-019-1901-4 \|2 doi
035			\|a 2171314603
045	2		\|b d20190101 \|b d20190131
084			\|a 137755 \|2 nlm
100	1		\|a Yang, Hong \|u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China
245	1		\|a Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory
260			\|b Springer Nature B.V. \|c Jan 2019
513			\|a Journal Article
520	3		\|a The multicore evolution has stimulated renewed interests in scaling up applications on shared-memory multiprocessors, significantly improving the scalability of many applications. But the scalability is limited within a single node; therefore programmers still have to redesign applications to scale out over multiple nodes. This paper revisits the design and implementation of distributed shared memory (DSM) as a way to scale out applications optimized for non-uniform memory access (NUMA) architecture over a well-connected cluster. This paper presents MAGI, an efficient DSM system that provides a transparent shared address space with scalable performance on a cluster with fast network interfaces. MAGI is unique in that it presents a NUMA abstraction to fully harness the multicore resources in each node through hierarchical synchronization and memory management. MAGI also exploits the memory access patterns of big-data applications and leverages a set of optimizations for remote direct memory access (RDMA) to reduce the number of page faults and the cost of the coherence protocol. MAGI has been implemented as a user-space library with pthread-compatible interfaces and can run existing multithreaded applications with minimized modifications. We deployed MAGI over an 8-node RDMAenabled cluster. Experimental evaluation shows that MAGI achieves up to 9.25x speedup compared with an unoptimized implementation, leading to a scalable performance for large-scale data-intensive applications.
653			\|a Data management
653			\|a Synchronism
653			\|a Clusters
653			\|a Resource management
653			\|a Distributed shared memory
653			\|a Servers
653			\|a Distributed memory
653			\|a Redesign
653			\|a Memory management
653			\|a Nodes
700	1		\|a Yang, Zheng \|u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China
700	1		\|a Yang, Fan \|u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China
700	1		\|a Bin-Yu, Zang \|u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China
700	1		\|a Hai-Bing Guan \|u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China
700	1		\|a Hai-Bo Chen \|u Shanghai Key Laboratory for Scalable Computing Systems, Shanghai Jiao Tong University, Shanghai, China
773	0		\|t Journal of Computer Science and Technology \|g vol. 34, no. 1 (Jan 2019), p. 94
786	0		\|d ProQuest \|t ABI/INFORM Global
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/2171314603/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/2171314603/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch