Rethinking Storage System Design in Distributed NVRAM+RDMA Clusters
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Recent advances in hardware technologies raise new opportunities for architecting storage systems to exploit emerging NVRAM memory devices, fast remote-memory RDMA networking, and large numbers of processor cores. These technologies provide new opportunities for creating scalable high-throughput data management systems with low latency and strong consistency guarantees. In this thesis, we investigate the design space for distributed storage systems based on these emerging technologies. The design focuses on three components: a novel high-performance and strongly consistent data access protocol, new communication abstractions, and QoS controls.
We present Telepathy, a novel data access protocol for distributed key-value storage systems. Telepathy supports replicated data storage for fault tolerance and guarantees strong consistency while supporting high-volume concurrent read/write access. Our read protocol can perform (largely) silent consistent reads from any of the replica nodes holding an object, while our write protocol exploits remote atomics and non-volatile buffers to silently resolve write contention.
For inter-server communication, we present a new distributed communication channel (DCC) that separates control and data communication directly at the RNIC. By using different RDMA semantics, our scheme avoids frequent remote processor interruption, and improves latency, throughput, CPU utilization, and memory usage.
For QoS control, we design a new algorithm to support QoS for applications using one-sided data access operations. A silent token dispatch mechanism is designed to inform storage nodes of the real-time throughput of connected clients, and adaptively change the token distribution to guarantee clients meet their target reservations with small overhead.
Our experiments on an RDMA-enabled cluster using YCSB benchmarks show that our distributed key-value store can achieve microsecond-range reads and writes with small tail latencies, GBps-range data access bandwidth, low CPU utilization, and strong data consistency guarantees. The system also supports QoS reservations with only minor performance impact.
Description
Advisor
Degree
Type
Keywords
Citation
Liu, Qingyue. "Rethinking Storage System Design in Distributed NVRAM+RDMA Clusters." (2020) Diss., Rice University. https://hdl.handle.net/1911/109637.