R-3 Repository :: Browsing by Author "Varman, Peter"

Browsing by Author "Varman, Peter"

Now showing 1 - 12 of 12

A Hybrid Genetic Algorithm Towards Network Aware Virtual Machine Placement in Data Centers
(2019-03-28) Qi, Xiangning; Varman, Peter
With the explosive growth in the size of datasets in cloud applications, the demands for network bandwidth within a data center are increasing tremendously. Applications running in cloud data centers are commonly composed of clusters of virtual machines (VMs) that communicate extensively with each other, resulting in increased pressure on network bandwidth.Server consolidation exacerbates the problem by placing multiple VMs from possibly different applications on a small set of physical machines and multiplexing server resources among them. Network-aware virtual machine placement (NAVMP) aims to place the VMs in a virtual cluster on the physical servers (hosts) of a data center to minimize the communication bottleneck. The problem is NP-hard and no existing exact method is able to scale up satisfactorily. In this thesis, we propose a hybrid genetic algorithm to solve the NAVMP problem. We utilize a two-stage approach made up of a greedy heuristic to find a set of good initial solutions that serve as seeds for a genetic algorithm to improve the quality of the solutions. The algorithm tends to place VMs that exchange a large amount of data on the same host if possible, and to align the virtual machine cluster communications and physical machine topology in the training process. Simulation results show that our algorithm can benefit both traffic flow and load balance in the routers.
Continuous checkpointing of HTM transactions in NVM
(ACM, 2017) Giles, Ellis; Doshi, Kshitij; Varman, Peter
This paper addresses the challenges of coupling byte addressable non-volatile memory (NVM) and hardware transaction memory (HTM) in high-performance transaction processing. We first show that HTM transactions can be ordered using existing processor instructions without any hardware changes. In contrast, existing solutions posit changes to HTM mechanisms in the form of special instructions or modified functionality. We exploit the ordering mechanism to design a novel persistence method that decouples HTM concurrency from back-end NVM operations. Failure atomicity is achieved using redo logging coupled with aliasing to guard against mistimed cache evictions. Our algorithm uses efficient lock-free mechanisms with bounded static memory requirements. We evaluated our approach using both micro-benchmarks, and, benchmarks in the STAMP suite, and showed that it compares well with standard (volatile) HTM transactions. We also showed that it yields significant gains in throughput and latency in comparison with persistent transactional locking.
Design and Implementation of I/O Servers Using the Device File Boundary
(2015-07-31) Amiri Sani, Ardalan; Zhong, Lin; Varman, Peter; Wallach, Dan; Vasudevan, Venu
Due to historical reasons, today's computer systems treat I/O devices as second-class citizens, supporting them with ad hoc and poorly-developed system software. As I/O devices are getting more diverse and are taking a central role in modern systems from mobile systems to servers, such second-class system support hinders novel system services such as I/O virtualization and sharing. The goal of this thesis is to tackle these challenges by rethinking the system support for I/O devices. For years, research for I/O devices is limited largely to network and storage devices. However, a diverse set of I/O devices are increasingly important for emerging computing paradigms. For modern mobile systems such as smartphones and tablets, I/O devices such as sensors and actuators are essential to the user experience. At the same time, high-performance computers in datacenters are embracing hardware specialization, or accelerators, such as GPU, DSP, crypto accelerator, etc., to improve the system performance and efficiency as the Dennard scaling has ended. Modern systems also treat such specialized hardware as I/O devices. Since I/O devices are becoming the fundamental service provided by many computer systems, we suggest that they should be treated as I/O servers that are securely accessible to other computers, i.e., clients, as well. I/O servers will be the fundamental building blocks of future systems, enabling the novel system services mentioned above. For example, they enable a video chat application running on a tablet to use the camera on the user's smart glasses and, for better consolidation, enable all applications running in a datacenter to share an accelerator cluster over the network. We address two fundamental challenges of I/O servers: remote access and secure sharing. Remote access enables an application in one machine, either virtual or physical, to use an I/O device in a different machine. We use a novel boundary for remote access: Unix device files, which are used in Unix-like operating systems to abstract various I/O devices. Using the device file boundary for remote access requires low engineering effort as it is common to many classes of I/O devices. In addition, we show that this boundary achieves high performance, supports legacy applications and I/O devices, supports multiple clients, and makes all features of I/O devices available to unmodified applications. An I/O server must provide security guarantees for untrusting clients. Using the device file boundary, a malicious client can exploit the -- very common -- security bugs in device drivers to compromise the I/O server and hence other clients. We propose two solutions for this problem. First, if available in the I/O server, we use a trusted hypervisor to enforce fault and device data isolation between clients. This solution assumes the driver is compromised and hence cannot guarantee functional correctness. Therefore, as a second solution, we present a novel device driver design, called library drivers, that minimizes the device driver Trusted Computing Base (TCB) size and attack surface and hence reduces the possibility of the driver-based exploits. Using our solutions for remote access and secure sharing, we demonstrate that I/O servers enable novel system services: (i) I/O sharing between virtual machines, i.e., I/O virtualization, where virtual machines (VMs) share the I/O devices in the underlying physical machine, (ii) I/O sharing between mobile systems, where one mobile system uses the I/O devices of another system over a wireless connection, and (iii) I/O sharing between servers in a datacenter, where the VMs in one server use the I/O devices of other servers over the network.
Enabling QoS Controls in Modern Distributed Storage Platforms
(2020-10-08) Peng, Yuhan; Varman, Peter
Distributed storage systems provide a scalable approach for hosting multiple clients on a consolidated storage platform. The use of shared infrastructure can lower costs but exacerbates the problem of fairly allocating the IO resources. Providing performance Quality-of-Service (QoS) guarantees in a distributed storage environment poses unique challenges. Workload demands of clients shift unpredictably between servers as their locality and IO intensities fluctuate. This complicates the problem of providing QoS controls like reservations and limits that are based on aggregate client service, as well as providing differentiated tail latency guarantees to the clients. In this thesis, we present novel approaches for providing bandwidth allocation and response time QoS in distributed storage platforms. For bandwidth allocation QoS, we develop a token-based scheduling framework to guarantee the maximum and minimum aggregate throughput of different clients. We introduce a novel algorithm called pTrans for solving the token allocation problem. pTrans is provably optimal and has better theoretical and empirical scalability than competing approaches based on linear-programming or max-flow formulations. For the response time QoS, we introduce Fair-EDF, a framework that extends the earliest deadline first (EDF) scheduler to provide fairness control while supporting latency guarantees.
Hardware Transactional Persistent Memory
(2019-01-31) Giles, Ellis Robinson; Varman, Peter
Recent years have witnessed a sharp shift towards real-time data-driven and high-throughput applications, impelled by pervasive multi-core architectures and parallel programming models. This shift has spurred a broad adoption of in-memory databases and massively-parallel transaction processing across scientific, business, and industrial application domains. However, these applications are severely handicapped by the difficulties in maintaining persistence on typical durable media like hard-disk drives (HDDs) and solid-state drives (SSDs) without sacrificing either performance or reliability. The ending of Moore's Law and Dennard Scaling have further slowed performance gains and scalability of these applications. Two emerging hardware developments hold enormous promise for transformative gains in both speed and scalability of concurrent data-intensive applications. The first is the arrival of Persistent Memory, or PM, a generic term for byte-addressable non-volatile memories, such as Intel's 3D XPoint technology. The second is the availability of CPU-based transaction support known as Hardware Transactional Memory, or HTM, which makes it easier for applications to exploit multi-core concurrency without the need for expensive lock-based software. This thesis introduces Hardware Transactional Persistent Memory, the first union of HTM with PM without any changes to known processor designs or protocols, allowing for high-performance, concurrent, and durable transactions. The techniques presented are supported on three pillars: handling uncontrolled cache evictions from the processor cache hierarchy, logging to resist failure during persistent memory updates, and transaction ordering to permit consistent recovery from a machine crash. We develop pure software solutions that work with existing processor architectures as well as software-assisted solutions that exploit external memory controller hardware support. The thesis also introduces the notion of relaxed versus strict durability, allowing individual applications to tradeoff performance against robustness, while guaranteeing recovery to a consistent system state.
Method for assuring quality of service in distributed storage system, control node, and system
(2022-05-03) Yu, Si; Gong, Junhui; Varman, Peter; Peng, Yuhan; Rice University; Huawei Technologies Co., Ltd.; William Marsh Rice University; United States Patent and Trademark Office
The present disclosure discloses a method for assuring quality of service in a storage system, where a control node calculates, based on a quantity of remaining I/O requests of a target storage node in a unit time, a quantity of I/O requests required by a storage resource to reach a lower assurance limit in the unit time, and a quantity of I/O requests need to be processed by the target storage node for the storage resource in the unit time, a lower limit quantity of I/O requests that can be processed by the target storage node for the storage resource in the unit time; allocates, based on the lower limit quantity of I/O requests, a lower limit quantity of tokens of the storage resource on the target storage node in the unit time to the storage resource; and sends the lower limit quantity of tokens to the target storage node.
Non-intrusive Persistence with a Backend NVM Controller
(IEEE, 2015) Pu, Libei; Doshi, Kshitij; Giles, Ellis; Varman, Peter
By providing instruction-grained access to vast amounts of persistent data with ordinary loads and stores, byte-addressable storage class memory (SCM) has the potential to revolutionize system architecture. We describe a non-intrusive SCM controller for achieving light-weight failure atomicity through back-end operations. Our solution avoids costly software intervention by decoupling isolation and concurrency-driven atomicity from failure atomicity and durability, and does not require changes to the front-end cache hierarchy. Two implementation alternatives – one using a hardware structure, and the other extending the memory controller with a firmware managed volatile space, are described.
Performance Analysis of Program Executions on Modern Parallel Architectures
(2014-07-25) Liu, Xu; Mellor-Crummey, John; Sarkar, Vivek; Varman, Peter; Browne, James
Parallel architectures have become common in supercomputers, data centers, and mobile chips. Usually, parallel architectures have complex features: many hardware threads, deep memory hierarchies, and non-uniform memory access (NUMA). Program designs without careful consideration of these features may lead to poor performance on such architectures. First, multi-threaded programs can suffer from performance degradation caused by imbalanced workload, overuse of synchronization, and parallel overhead. Second, parallel programs may suffer from the long latency to the main memory. Third, in a NUMA system, memory accesses can be remote rather than local. Without a NUMA-aware design, a threaded program may have many costly remote accesses and imbalanced memory requests to NUMA domains. Performance tools can help us take full advantage of the power of parallel architectures by providing insight into where and why a program fails to obtain top performance. This dissertation addresses the difficulty of obtaining insights about performance bottlenecks in parallel programs using lightweight measurement techniques. This dissertation makes four contributions. First, it describes a novel performance analysis method for OpenMP programs, which can identify root causes of performance losses. Second, it presents a data-centric analysis method that associates performance metrics with data objects. This data-centric analysis can both identify both a program's problematic memory accesses and associated variables; this information can help an application developer optimize programs for better locality. Third, this dissertation discusses the development of a lightweight method that collects memory reuse distance to guide cache locality optimization. Finally, it describes implemented a lightweight profiling method that can help pinpoint performance losses in programs on NUMA architectures and provide guidance about how to transform the program to improve performance. To validate the utility of these methods, I implemented them in HPCToolkit, a state-of-the-art profiler developed at Rice University. I used the extended HPCToolkit to study several parallel programs. Guided by the performance insights provided by the new techniques introduced in this dissertation, I optimized all of these programs and was able to obtain non-trivial improvements to their performance. The measurement overhead incurred by these new analysis methods is very small in both runtime and memory.
Persisting in-memory databases using SCM
(IEEE, 2016) Giles, Ellis; Doshi, Kshitij; Varman, Peter
Big Data applications need to be able to access large amounts of variable data as fast as possible. Emerging Storage Class Memory (SCM) fit this need by making memory available in large capacity while making changes endure as a seamless continuation of load-store accesses through processor caches. However, when writing values into a persistent memory tier, programmers are faced with the dual problems of controlling untimely cache evictions that might commit changes prematurely, and of grouping changes and making them durable as a unit so that consistency can be guaranteed in the event of sudden failure. In this paper, we present various methods to achieve high-performance byte-addressable persistence for an in-memory data store. We chose Redis, a popular high-performance memory oriented key value database. We modified its source code to use SCM such that updates to data and structures are performed in a failure resilient manner. We evaluated the changes using both internal benchmarks and the Yahoo! Cloud Servicing Benchmark (YCSB). We found that even though Redis uses many SCM read operations, it can benefit from highly optimized persistent SCM write based approaches, especially when SCM write times are longer than DRAM write times. The paper presents an innovative Local Alias Table Batched (LATB) method, and shows that it outperforms the alternatives.
Software Support for Atomicity and Persistence in Non-volatile Memory
(Memory Organization and Architecture Workshop, 2013-10) Giles, Ellis; Doshi, Kshitij; Varman, Peter
Advances in memory technology are promising the availability of byte-addressable persistent memory as an integral component of future computing platforms. This change has significant implications for software that has traditionally made a sharp distinction between durable and volatile storage. In this paper we describe a software framework for persistent memory that provides atomicity and durability while simultaneously ensuring that fast paths through the cache, DRAM, and persistent memory layers are not slowed down.
Software Support for Efficient Use of Modern Computer Architectures
(2015-08-14) Chabbi, Milind Mohan; Mellor-Crummey, John; Sarkar, Vivek; Varman, Peter; Iancu, Costin
Parallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memory hierarchies make modern architectures difficult to program efficiently. Achieving top performance on supercomputers is difficult due to complex hardware, software, and their interactions. Production software systems fail to achieve top performance on modern architectures broadly due to three main causes: resource idleness, parallel overhead, and data movement overhead. This dissertation presents novel and effective performance analysis tools, adaptive runtime systems, and architecture-aware algorithms to understand and address these problems. Many future high performance systems will employ traditional multicore CPUs augmented with accelerators such as GPUs. One of the biggest concerns for accelerated systems is how to make best use of both CPU and GPU resources. Resource idleness arises in a parallel program due to insufficient parallelism and load imbalance among other causes. To assess systemic resource idleness arising in GPU-accelerated architectures, we developed efficient profiling and tracing capabilities. We introduce CPU-GPU blame shifting--a novel technique to pinpoint and quantify the causes of resource idleness in GPU-accelerated architectures. Parallel overheads arise due to synchronization constructs such as barriers and locks used in parallel programs. We developed a new technique to identify and eliminate redundant barriers at runtime in Partitioned Global Address Space programs. In addition, we developed a set of novel mutual exclusion algorithms that exploit locality in the memory hierarchy to improve performance on Non-Uniform Memory Access architectures. In modern architectures, inefficient or unnecessary memory accesses can severely degrade program performance. To pinpoint and quantify wasteful memory operations, we developed a fine-grain execution-monitoring framework. We extended this framework and demonstrated the feasibility of attributing fine-grain execution metrics to source and data in their contexts for long running programs--a task previously thought to be infeasible. Together the solutions described in this dissertation were employed to gain insights into the performance of a collection of important programs, both parallel and serial. The insights we gained enabled us to improve the performance of many of these programs by a significant margin. Software for future systems will benefit from the techniques described in this dissertation.
Transaction local aliasing in storage class memory
(IEEE, 2015) Giles, Ellis; Doshi, Kshitij; Varman, Peter
This paper describes a lightweight software library to solve the challenges [6], [3], [1], [5], [2] of programming storage class memory (SCM). It provides primitives to demarcate failure-atomic code regions. SCM loads and stores within each demarcated code region (called a “wrap”) are routed through the library, which buffers updates and transmits them to SCM locations asynchronously while allowing their speedy propagation from writers to readers through CPU caches.

Browsing by Author "Varman, Peter"

Results Per Page

Sort Options