Browsing by Author "Ng, T. S. Eugene"
Now showing 1 - 20 of 39
Results Per Page
Sort Options
Item Analysis of Hadoop’s Performance under Failures(2011-08-11) Dinu, Florin; Ng, T. S. EugeneFailures are common in today’s data center environment and can significantly impact the performance of important jobs running on top of large scale computing frameworks. In this paper we analyze Hadoop’s behavior under compute node and process failures. Surprisingly, we find that even a single failure can have a large detrimental effect on job running times. We uncover several important design decisions underlying this distressing behavior: the inefficiency of Hadoop’s statistical speculative execution algorithm, the lack of sharing failure information and the overloading of TCP failure semantics. We hope that our study will add new dimensions to the pursuit of robust large scale computing framework designs.Item COMMA: Coordinating the Migration of Multi-tier Applications(2014-11-24) Liu, Zhaolei; Ng, T. S. Eugene; Sripanidkulchai, Kunwadee; Zheng, JieMulti-tier applications are widely deployed in today’s virtualized cloud computing environments. At the same time, management operations in these virtualized environments, such as load balancing, hardware maintenance, workload consolidation, etc., often make use of live virtual machine (VM) migration to control the placement of VMs. Although existing solutions are able to migrate a single VM efficiently, little attention has been devoted to migrating related VMs in multi-tier applications. Ignoring the relatedness of VMs during migration can lead to serious application performance degradation. This paper formulates the multi-tier application migration problem, and presents a new communication-impact driven coordinated approach, as well as a system called COMMA that realizes this approach. Through extensive testbed experiments, numerical analyses, and a demonstration of COMMA on Amazon EC2, we show that this approach is highly effective in minimizing migration’s impact on multi-tier applications’ performance.Item Control Plane Design and Performance Analysis for Optical Multicast-Capable Datacenter Networks(2014-04-18) Xia, Yiting; Ng, T. S. Eugene; Jermaine, Christopher M.; Cox, Alan L.This study presents a control plane design for an optical multicast-capable datacenter network and evaluates the system performance using simulations. The increasing number of datacenter applications with heavy one-to-many communications has raised the need for an efficient group data delivery solution. We propose a clean-slate architecture that uses optical multicast technology to enable ultra-fast, energy-efficient, low cost, and highly reliable group data delivery in the datacenter. Since the optical components are agnostic of existing communication protocols, I design novel control mechanisms to coordinate datacenter applications with the optical network. Applications send explicit requests for group data delivery through an API exposed by a centralized controller. Based on the collected traffic demands, the controller computes optical resource allocations using a proposed control algorithm to maximize utilization of the optical network. Finally, the controller changes the optical network topology according to the computation decision and sets forwarding rules to route traffic to the correct data paths. I evaluate the optimality and complexity of the control algorithm with real datacenter traffic. It achieves near optimal solutions in almost all experiment cases and can finish computation instantaneously on a large datacenter setting. I also develop a set of simulators to compare the performance of our system against a number of state-of-the-art group data delivery approaches, such as the non-blocking datacenter architecture, datacenter BitTorrent, datacenter IP multicast, etc. Extensive simulations using synthetic traffic show our solution can provide an order of magnitude performance improvement. Tradeoffs of our system are analyzed quantitatively as well.Item Controlling Race Conditions in OpenFlow to Accelerate Application Verification and Packet Forwarding(2014-10-24) Sun, Xiaoye Steven; Ng, T. S. Eugene; Knightly, Edward W; Zhong, LinOpenFlow is a Software Defined Networking (SDN) protocol that is being deployed in critical network systems. SDN application verification takes an important role in guaranteeing the correctness of the application. Through our investigation, we discover that application verification can be very inefficient under the OpenFlow protocol since there are many race conditions between the data packets and control plane messages. Furthermore, these race conditions also increase the control plane workload and packet forwarding delay. We propose Attendre, an OpenFlow extension, to mitigate the ill effects of the race conditions in OpenFlow networks. We have implemented Attendre in NICE (a model checking verifier), Open vSwitch (a software virtual switch) and NOX (an OpenFlow control platform). Experiments show that Attendre can reduce verification time by several orders of magnitude, and can significantly reduce TCP connection setup time.Item Design and implementation of the Maestro network control platform(2009) Cai, Zheng; Ng, T. S. EugeneComputer network operation is inherently complex because it consists of many functions such as routing, firewalling, VPN provisioning, traffic load-balancing, network maintenance, etc. To cope with this, network designers have created modular components to handle each function. Unfortunately, in reality, unavoidable dependencies exist among some of the components and they may interact accidentally. There is no single mechanism for systematically governing the interactions among the various components. In addition, routing is mainly realized by distributed routing protocols for higher survivability. Some other components need to be centralized, because either they have no obvious mapping onto distributed computations, or they could achieve more optimal solutions. Both distributed control and centralized control are necessary in network management. However, the interaction between distributed and centralized controls makes the problem even more complicated. No existing study has considered either how to systematically manage the interactions among network control components, or how distributed and centralized control system could collaborate to leverage the inherent advantages of both. To address these problems, we propose a system called Maestro. Maestro orchestrates the network control components that govern the behavior of a network, and enables the collaboration between distributed and centralized control components. Maestro provides abstractions for the modular implementation of network control components, and addresses the fundamental problems originating from the concurrent operations of network control components, namely communication between components, scheduling of component executions, concurrency management and protection enforcement. Maestro allows distributed and centralized control components to collaborate in four different approaches, and each approach has different strength and weakness. In this thesis we present the design and implementation of a prototype of Maestro, and evaluate the performance and effectiveness of Maestro mechanisms.Item Designing Hybrid Data Center Networks for High Performance and Fault Tolerance(2020-01-08) Wu, Dingming; Ng, T. S. EugeneThis thesis explores the design space of hybrid electrical and optical network architectures for modern data centers. It tries to reach a delicate balance between performance, fault-tolerance, scalability and cost through coordinated use of both electrical and optical components in the network. We have developed several approaches to achieving these goals from different angles. First, we used optical splitters as key building blocks to improve multicast transmission performance. We built an unconventional optical multicast architecture, called HyperOptics, that provides orders of magnitude of throughput improvement for multicast transmissions. Second, we developed a failure tolerant network, called ShareBackup, by embedding optical switches into the Clos networks. Sharebackup, for the first time, achieves network-wide full-capacity failure recovery in milliseconds. Third, we proposed to enable programmable network topology at runtime by inserting optical switches at the network edge. Our system, called RDC, breaks the bandwidth boundaries between servers and dynamically optimizes its topology according to traffic patterns. Through these three works, we demonstrate the high potential of hybrid datacenter network architectures for high performance and fault-tolerance.Item Designing Scalable Networks for Future Large Datacenters(2012-09-05) Stephens, Brent; Cox, Alan L.; Rixner, Scott; Ng, T. S. Eugene; Carter, JohnModern datacenters require a network with high cross-section bandwidth, fine-grained security, support for virtualization, and simple management that can scale to hundreds of thousands of hosts at low cost. This thesis first presents the firmware for Rain Man, a novel datacenter network architecture that meets these requirements, and then performs a general scalability study of the design space. The firmware for Rain Man, a scalable Software-Defined Networking architecture, employs novel algorithms and uses previously unused forwarding hardware. This allows Rain Man to scale at high performance to networks of forty thousand hosts on arbitrary network topologies. In the general scalability study of the design space of SDN architectures, this thesis identifies three different architectural dimensions common among the networks: source versus hop-by-hop routing, the granularity at which flows are routed, and arbitrary versus restrictive routing and finds that a source-routed, host-pair granularity network with arbitrary routes is the most scalable.Item Efficient traffic trajectory error detection(2010) Zhang, Bo; Ng, T. S. EugeneOur recent survey on publicly reported router bugs shows that many router bugs, once triggered, can cause various traffic trajectory errors including traffic deviating from its intended forwarding paths, traffic being mistakenly dropped and unauthorized traffic bypassing packet filters. These traffic trajectory errors are serious problems because they may cause network applications to fail and create security loopholes for network intruders to exploit. Therefore, traffic trajectory errors must be quickly and efficiently detected so that the corrective action can be performed in a timely fashion. Detecting traffic trajectory errors requires the real-time tracking of the control states (e.g., forwarding tables, packet filters) of routers and the scalable monitoring of the actual traffic trajectories in the network. Traffic trajectory errors can then be detected by efficiently comparing the observed traffic trajectories against the intended control states. Making such trajectory error detection efficient and practical for large-scale high speed networks requires us to address many challenges. First, existing traffic trajectory monitoring algorithms require the simultaneously monitoring of all network interfaces in a network for the packets of interest, which will cause a daunting monitoring overhead. To improve the efficiency of traffic trajectory monitoring, we propose the router group monitoring technique that only monitors the periphery interfaces of a set of selected router groups. We analyze a large number of real network topologies and show that effective router groups with high trajectory error detection rates exist in all cases. We then develop an analytical model for quickly and accurately estimating the detection rates of different router groups. Based on this model, we propose an algorithm to select a set of router groups that can achieve complete error detection and low monitoring overhead. Second, maintaining the control states of all the routers in the network requires a significant amount of memory. However, there exist no studies on how to efficiently store multiple complex packet filters. We propose to store multiple packet filters using a shared Hyper- Cuts decision tree. To help decide which subset of packet filters should share a HyperCuts decision tree, we first identify a number of important factors that collectively impact the efficiency of the resulting shared HyperCuts decision tree. Based on the identified factors, we then propose to use machine learning techniques to predict whether any pair of packet filters should share a tree. Given the pair-wise prediction matrix, a greedy heuristic algorithm is used to classify packet filters into a number of shared HyperCuts decision trees. Our experiments using both real packet filters and synthetic packet filters show that our shared HyperCuts decision trees require considerably less memory while having the same or a slightly higher average height than separate trees. In addition, the shared HyperCuts decision trees enable concurrent lookup of multiple packet filters sharing the same tree. Finally, based on the two proposed techniques, we have implemented a complete prototype system that is compatible with Juniper's JUNOS. We have shown in the thesis that, to detect traffic trajectory errors, it is sufficient to only selectively implement a small set of key functions of a full-fletched router on our prototype, which makes our prototype simpler and less error prone. We conduct both Emulab experiments and micro-benchmark experiments to show that the system can efficiently track router control states, monitor traffic trajectories and detect traffic trajectory errors.Item Exploiting Internet Delay Space Properties for Sybil Attack Mitigation(2008-06-02) Ng, T. S. Eugene; Zhang, BoRecent studies have discovered that the Internet delay space has many interesting properties such as triangle inequality violations (TIV), clustering structures, and constrained growth. Understanding these properties has so far benefited the design of network models and network-performance-aware systems. In this paper, we consider an interesting, previously unexplored connection between Internet delay space properties and network locations. We show that this connection can be exploited to mitigate the Sybil attack problem in peer-to-peer systems.Item Gleaning network wide congestion information from packet markings(2010) Dinu, Florin; Ng, T. S. EugeneCongestion information can greatly benefit network level decisions. For example, fast-reroute algorithms should leverage congestion information when computing backup paths. They could also use the information to monitor if the re-routing decision itself causes congestion in the network. Today, most solutions for inferring congestion work at the end-host level and relay end-to-end congestion information to transport protocols. Network level decisions, on the other hand, may need link level congestion information. Unfortunately, the mechanisms that routers can use to infer link level congestion information are insufficient. Such information could potentially be obtained by periodically sharing estimates between routers. However, this solution increases the traffic load on the network and has difficulty in reliably delivering the estimates during periods of congestion. In this thesis we show that routers inside an autonomous system can easily and accurately infer congestion information about each other. Routers first measure path level congestion information only from the congestion markings in the traffic that they forward. Next, we propose that routers combine routing information with the path level congestion information to obtain a more detailed description of the congestion in the network. Link level congestion information can be computed using this approach. Our techniques never add supplementary traffic into the network and use little router resources. They can be deployed incrementally or in heterogeneous environments. We show that the accuracy of the inference is good using experiments with multiple traffic patterns and various congestion levels.Item Gleaning Network-Wide Congestion Information from Packet Markings(2010-06-29) Dinu, Florin; Ng, T. S. EugeneDistributed control protocols routinely have to operate oblivious of dynamic network information for scalability or complexity reasons. However, more informed protocols are likely to make more intelligent decisions. We argue that protocols can leverage dynamic congestion information without suffering the mentioned penalties. In this paper we show that routers can readily exchange congestion information in a purely passive fashion using congestion markings from existing traffic. As a result, each router can locally infer a congestion map of the network. Moreover, the maps are continuously updated with near real-time information. Our solution for building the congestion maps leverages standardized and widely used congestion management protocols and does not require changes to end hosts. We find that 90% of the time, the inference accuracy is usually within 10% even for environments with multiple congestion points and sudden changes in the traffic pattern.Item Handling Congestion and Routing Failures in Data Center Networking(2015-09-01) Stephens, Brent; Cox, Alan L.; Rixner, Scott; Ng, T. S. Eugene; Zhong, LinToday's data center networks are made of highly reliable components. Nonetheless, given the current scale of data center networks and the bursty traffic patterns of data center applications, at any given point in time, it is likely that the network is experiencing either a routing failure or a congestion failure. This thesis introduces new solutions to each of these problems individually and the first combined solutions to these problems for data center networks. To solve routing failures, which can lead to both packet loss and a loss of connectivity, this thesis proposes a new approach to local fast failover, which allows for traffic to be quickly rerouted. Because forwarding table state limits both the fault tolerance and the largest network size that is implementable given local fast failover, this thesis introduces both a new forwarding table compression algorithm and Plinko, a compressible forwarding model. Combined, these contributions enable forwarding tables that contain routes for all pairs of hosts that can reroute traffic even given multiple arbitrary link failures on topologies with tens of thousands of hosts. To solve congestion failures, this thesis presents TCP-Bolt, which uses lossless Ethernet to prevent packets from ever being dropped. Unlike prior work, this thesis demonstrates that enabling lossless Ethernet does not reduce aggregate forwarding throughput in data center networks. Further, this thesis also demonstrates that TCP-Bolt can significantly reduce flow completion times for medium sized flows by allowing for TCP slow-start to be eliminated. Unfortunately, using lossless Ethernet to solve congestion failures introduces a new failure mode, deadlock, which can render the entire network unusable. No existing fault tolerant forwarding models are deadlock-free, so this thesis introduces both deadlock-free Plinko and deadlock-free edge disjoint spanning tree (DF-EDST) resilience, the first deadlock-free fault tolerant forwarding models for data center networks. This thesis shows that deadlock-free Plinko does not impact forwarding throughput, although the number of virtual channels required by deadlock-free Plinko increases as either topology size or fault tolerance increases. On the other hand, this thesis demonstrates that DF-EDST provides deadlock-free local fast failover without needing virtual channels. This thesis shows that, with DF-EDST resilience, less than one in a million of the flows in data center networks with thousands of hosts are expected to fail even given tens of failures. Further, this thesis shows that doing so incurs only a small impact on the maximal achievable aggregate throughput of the network, which is acceptable given the overall decrease in flow completion times achieved by enabling lossless forwarding.Item High-Performance Communication Protocols for Asynchronous Duty-Cycling Wireless Networks(2013-11-07) Tang, Lei; Johnson, David B.; Ng, T. S. Eugene; Knightly, Edward W.Duty cycling is a technique for saving energy in resource-limited wireless networks such as sensor networks. With duty cycling, each node periodically switches between active and sleeping states, for example being active for only 1 to 10 percent of the time. Wireless duty-cycling networks face many challenges such as maintaining high energy efficiency, efficient packet delivery under dynamic channel conditions, and effective route discovery. This thesis presents a series of protocols to address these challenges. The first part of this thesis presents a new single-channel energy-efficient MAC protocol, called the Predictive-Wakeup MAC (PW-MAC). The key idea behind PW-MAC is to allow each node to wake up asynchronously at randomized times, while enabling senders to predict receiver wakeup times to save energy. Extending the randomized predictive wakeup mechanism of PW-MAC, the second part of this thesis presents a new multichannel energy-efficient MAC protocol, called the Efficient-Multichannel MAC (EM-MAC). EM-MAC enables each node to dynamically optimize the selection of wireless channels it utilizes based on the channel conditions it senses. By adapting to changing channel conditions, EM-MAC achieves high packet delivery performance. EM-MAC also achieves high energy efficiency through its predictive multichannel wakeup mechanism. Although duty cycling saves energy, I found that, in asynchronous duty-cycling networks, existing on-demand routing protocols tend to discover routes much worse than the optimal routes. The last part of this thesis presents four optimization techniques to improve the routes discovered in such networks. These optimizations are fully distributed and work on different route metrics, such as hop-count and ETX. Implemented in TinyOS on a testbed of MICAz sensor nodes, PW-MAC achieved the lowest energy consumption and delivery latency among the single-channel protocols, while EM-MAC significantly outperformed all other protocols tested. EM-MAC maintained the lowest duty cycles, the lowest packet delivery latency, and 100% packet delivery ratio across all experiments, including those with concurrent multihop traffic flows, and those with heavy ZigBee and Wi-Fi interference. Finally, in simulations on the ns-2 network simulator, compared with the conventional on-demand route discovery, the presented route discovery optimizations substantially improved the routes discovered in asynchronous duty-cycling networks.Item High-Performance Data Multicast in Hybrid Data Center Networks(2018-11-30) Sun, Xiaoye Steven; Ng, T. S. EugeneNowadays, a significant number of big data processing applications, such as machine learning algorithms and database queries are implemented based on various distributed big data processing frameworks. The distributed computation logic in these applications greatly relies on data multicast, a data transfer pattern with which a piece of data is delivered to multiple destination servers. However, in these distributed frameworks, the state-of-the-art data multicast mechanisms are all based on application-layer multicast, in which data is delivered through unicast flows on top of an overlay network. This thesis proposes high-performance system components that solve the data multicast issue by leveraging hybrid data center networks. In a hybrid data center network, the racks are connected via a circuit switch (or a circuit-switched network) in addition to the traditional packet-switched network. Circuit switches fundamentally change the multicast communication capability among the servers since they can be extended to support physical layer multicast. This thesis achieves the goal of high-performance from two critical aspects, i.e., multicast data transfer and multicast data scheduling. In the first part, the thesis presents Republic, a complete platform providing high-performance ``data multicast service'' for applications running in hybrid data centers. Republic consists of Republic agent daemon running on each of the servers and a centralized Republic manager. The Republic agent (1) exposes a unified Republic API for the applications using the data multicast service, (2) talks with the Republic manager to request and return network resources for data multicast, and (3) achieves multicast data transfer efficiently and reliably. The Republic manager, takes the multicast data scheduling algorithm as a plug-in module. Republic is implemented and deployed in a hybrid data center testbed. The testbed evaluation shows that Republic can improve data multicast in Apache Spark machine learning applications by as much as 4.0 times. In the second part, the thesis tackles the problem of scheduling multicast data transfer in a high-bandwidth circuit switch. The scheduling algorithm adopts the approaches of multi-hopping and segmented transfer. It aims at minimizing the average demand completion time to deliver the most benefit to the applications. The algorithm exhibits up to 13.4 times improvement comparing with the state-of-the-art solution.Item Improving user authentication on the web: Protected login, strong sessions, and identity federation(2014-01-14) Dietz, Mike; Wallach, Daniel S.; Ng, T. S. Eugene; Koushanfar, FarinazClient authentication on the web has remained in the internet-equivalent of the stone ages for the last two decades. Instead of adopting modern public-key-based authentication mechanisms, we seem to be stuck with traditional methods like passwords and cookies. These authentication methods are vulnerable to a wide range of attacks from simple password reuse to strong man-in-the-middle attackers that can inject themselves into the middle of encrypted communication channels. While many potential solutions have been proposed to sole the issues with the use of passwords and cookies for web authentication, most have failed to take hold. This lack of adoption stems from two issues. First, traditional password based authentication provides a very simple user experience. Any new technique must not increase user friction during login and provide a reasonable user experience. Secondly, a new authentication technique must not be difficult to implement in existing browsers and web applications or deploy to users. This thesis presents three techniques that provide protection against strong attackers while providing a low friction user experience. The first, Origin Bound Certificates, is a session hardening technique that cryptographically binds the user's authentication cookie to the TLS channel the cookie is presented over. This technique protects a user's session against strong attackers, requires no additional user interaction, requires little (or no) modification to existing web applications, and is compatible with existing data center infrastructure like TLS terminators. The second, Opportunistic Cryptographic Identity Assertions, is a technique in which the web browsers communicates with a user's cell phone in order to establish it as an opportunistic second factor in the initial login operation. This technique provides security assurances comparable or greater than conventional two factor authentication (i.e. phishing and password reuse prevention) while offering a simple user experience. Finally, I discuss a new federated login system that makes use of a new browser provided construct called the PostKey API. This interface allows the browser to create a cross certification that asserts ownership of client side keys to a trusted third party. The these cross certifications can be verified by an identity provider and used to harden existing federated login protocols as well as to create a new federation protocol that is resistant to man-in-the-middle attacks and leaked authentication tokens and provides relying parties with the means the better secure communication with the user.Item Leaky Buffer: A Novel Abstraction for Relieving Memory Pressure form Cluster Data Processing Frameworks(2016-03-25) Liu, Zhaolei; Ng, T. S. EugeneThe shift to the in-memory data processing paradigm has had a major influence on the development of cluster data processing frameworks. Numerous frameworks from the industry, open source community and academia are adopting the in-memory paradigm to achieve functionalities and performance breakthroughs. However, despite the advantages of these in memory frameworks, in practice they are susceptible to memorypressure related performance collapse and failures. The contributions of this paper are two-fold. Firstly, we conduct a detailed diagnosis of the memory pressure problem and identify three preconditions for the performance collapse. These preconditions not only explain the problem but also shed light on the possible solution strategies. Secondly, we propose a novel programming abstraction called the leaky buffer that eliminates one of the preconditions, thereby addressing the underlying problem. We have implemented the leaky buffer abstraction in Spark. Experiments on a range of memory intensive aggregation operations show that the leaky buffer abstraction can drastically reduce the occurrence of memory-related failures, improve performance by up to 507% and reduce memory usage by up to 87.5%.Item Leaky Buffer: A Novel Abstraction for Relieving Memory Pressure from Cluster Data Processing Frameworks(2015-07-13) Liu, Zhaolei; Ng, T. S. Eugene; Cox, Alan L; Jermaine, Christopher MThe shift to the in-memory data processing paradigm has had a major influence on the development of cluster data processing frameworks. Numerous frameworks from the industry, open source community and academia are adopting the in-memory paradigm to achieve functionalities and performance breakthroughs. However, despite the advantages of these in-memory frameworks, in practice they are susceptible to memory-pressure related performance collapse and failures. The contributions of this thesis are two-fold. Firstly, we conduct a detailed diagnosis of the memory pressure problem and identify three preconditions for the performance collapse. These preconditions not only explain the problem but also shed light on the possible solution strategies. Secondly, we propose a novel programming abstraction called the leaky buffer that eliminates one of the preconditions, thereby addressing the underlying problem. We have implemented the leaky buffer abstraction in Spark for two distinct use cases. Experiments on a range of memory intensive aggregation operations show that the leaky buffer abstraction can drastically reduce the occurrence of memory-related failures, improve performance by up to 507% and reduce memory usage by up to 87.5%.Item Maestro: A System for Scalable OpenFlow Control(2010-12-04) Cai, Zheng; Cox, Alan L.; Ng, T. S. EugeneThe fundamental feature of an OpenFlow network is that the controller is responsible for the initial establishment of every flow by contacting related switches. Thus the performance of the controller could be a bottleneck. This paper shows how this fundamental problem is addressed by parallelism. The state of the art OpenFlow controller, called NOX, achieves a simple programming model for control function development by having a single-threaded event-loop. Yet NOX has not considered exploiting parallelism. We propose Maestro which keeps the simple programming model for programmers, and exploits parallelism in every corner together with additional throughput optimization techniques. We experimentally show that the throughput of Maestro can achieve near linear scalability on an eight core server machine.Item Maestro: Achieving scalability and coordination in centralizaed network control plane(2012) Cai, Zheng; Ng, T. S. EugeneModem network control plane that supports versatile communication services (e.g. performance differentiation, access control, virtualization, etc.) is highly complex. Different control components such as routing protocols, security policy enforcers, resource allocation planners, quality of service modules, and more, are interacting with each other in the control plane to realize complicated control objectives. These different control components need to coordinate their actions, and sometimes they could even have conflicting goals which require careful handling. Furthermore, a lot of these existing components are distributed protocols running on large number of network devices. Because protocol state is distributed in the network, it is very difficult to tightly coordinate the actions of these distributed control components, thus inconsistent control actions could create serious problems in the network. As a result, such complexity makes it really difficult to ensure the optimality and consistency among all different components. Trying to address the complexity problem in the network control plane, researchers have proposed different approaches, and among these the centralized control plane architecture has become widely accepted as a key to solve the problem. By centralizing the control functionality into a single management station, we can minimize the state distributed in the network, thus have better control over the consistency of such state. However, the centralized architecture has fundamental limitations. First, the centralized architecture is more difficult to scale up to large network size or high requests rate. In addition, it is equally important to fairly service requests and maintain low request-handling latency, while at the same time having highly scalable throughput. Second, the centralized routing control is neither as responsive nor as robust to failures as distributed routing protocols. In order to enhance the responsiveness and robustness, one approach is to achieve the coordination between the centralized control plane and distributed routing protocols. In this thesis, we develop a centralized network control system, called Maestro, to solve the fundamental limitations of centralized network control plane. First we use Maestro as the central controller for a flow-based routing network, in which large number of requests are being sent to the controller at very high rate for processing. Such a network requires the central controller to be extremely scalable. Using Maestro, we systematically explore and study multiple design choices to optimally utilize modern multi-core processors, to fairly distribute computation resource, and to efficiently amortize unavoidable overhead. We show a Maestro design based on the abstraction that each individual thread services switches in a round-robin manner, can achieve excellent throughput scalability while maintaining far superior and near optimal max-min fairness. At the same time, low latency even at high throughput is achieved by Maestro's workload-adaptive request batching. Second, we use Maestro to achieve the coordination between centralized controls and distributed routing protocols in a network, to realize a hybrid control plane framework which is more responsive and robust than a pure centralized control plane, and more globally optimized and consistent than a pure distributed control plane. Effectively we get the advantages of both the centralized and the distributed solutions. Through experimental evaluations, we show that such coordination between the centralized controls and distributed routing protocols can improve the SLA compliance of the entire network.Item Maestro: Balancing Fairness, Latency and Throughput in the OpenFlow Control Plane(2011-12-20) Cai, Zheng; Cox, Alan L.; Ng, T. S. EugeneThe fundamental feature of an OpenFlow network is that the controller is responsible for the configuration of switches for every traffic flow. This feature brings programmability and flexibility, but also puts the controller in a critical role in the performance of an OpenFlow network. To fairly service requests from different switches, to achieve low request-handling latency, and to scale effectively on multi-core processors are fundamental controller design requirements. With these requirements in mind, we explore multiple workload distribution designs within our system called Maestro. These designs are evaluated against the requirements, together with the static partitioning and static batching design found in other available multi-threaded controllers, NOX and Beacon. We find that a Maestro design based on the abstraction that each individual thread services switches in a round-robin manner can achieve excellent throughput scalability (second only to another Maestro design) while maintaining far superior and near optimal maxim fairness. At the same time, low latency even at high throughput is achieved thanks to Maestro’s workload adaptive request batching.