Programming the Network Management Stack for Cloud Datacenters

Date
2022-06-07
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Datacenter networks are the foundation of cloud services. The increasing demand of these services leads to challenging requirements to the network management, especially in security, performance, and reliability. However, managing the network to satisfy these requirements is difficult, as datacenter networks used to contain many opaque, vendor-specific components (e.g., proprietary switch hardware and software). The management stack only has limited control over these components, making optimizations hard to achieve.

In recent years, datacenter networks have become more open and programmable. This trend gives rise to the possibility of programmatic control from the management stack, with a more precise and customizable control loop. In this thesis, we leverage this trend to systematically improve the network management stack, rethinking how security, performance, and reliability goals can be better addressed programmatically. First, to address the unique security issues due to the emerging Remote Direct Memory Access (RDMA) hardware, we developed Bedrock. By co-designing RDMA hardware with programmable switches, Bedrock significantly enhances RDMA security without sacrificing its native performance. Second, to achieve high performance, granular load balancing in datacenter networks is required for to operate at high utilization. We built Contra to implement performance-aware routing protocols in a distributed manner. By analyzing the network topology and the user-provided policies, Contra can automate the generation of routing protocols for programmable switches to enforce the policies at hardware speeds. Finally, high reliability is crucial to large networks, and it is again up to the management stack to achieve this goal. We proposed Occam, a system that exposes a shim layer of APIs for network management tasks. With the restricted but expressive APIs, our algorithms on scheduling and rollback plan generation can provide better transaction semantics for network management tasks and prevent conflicts between them. In all three cases, we show how the network management stack can be drastically improved with programmability.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Network management, Programmable networks, Datacenter networks
Citation

Hsu, Kuo-Feng. "Programming the Network Management Stack for Cloud Datacenters." (2022) Diss., Rice University. https://hdl.handle.net/1911/113223.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page