High-Performance Data Multicast in Hybrid Data Center Networks

Date
2018-11-30
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Nowadays, a significant number of big data processing applications, such as machine learning algorithms and database queries are implemented based on various distributed big data processing frameworks. The distributed computation logic in these applications greatly relies on data multicast, a data transfer pattern with which a piece of data is delivered to multiple destination servers. However, in these distributed frameworks, the state-of-the-art data multicast mechanisms are all based on application-layer multicast, in which data is delivered through unicast flows on top of an overlay network. This thesis proposes high-performance system components that solve the data multicast issue by leveraging hybrid data center networks.

In a hybrid data center network, the racks are connected via a circuit switch (or a circuit-switched network) in addition to the traditional packet-switched network. Circuit switches fundamentally change the multicast communication capability among the servers since they can be extended to support physical layer multicast. This thesis achieves the goal of high-performance from two critical aspects, i.e., multicast data transfer and multicast data scheduling.

In the first part, the thesis presents Republic, a complete platform providing high-performance ``data multicast service'' for applications running in hybrid data centers. Republic consists of Republic agent daemon running on each of the servers and a centralized Republic manager. The Republic agent (1) exposes a unified Republic API for the applications using the data multicast service, (2) talks with the Republic manager to request and return network resources for data multicast, and (3) achieves multicast data transfer efficiently and reliably. The Republic manager, takes the multicast data scheduling algorithm as a plug-in module. Republic is implemented and deployed in a hybrid data center testbed. The testbed evaluation shows that Republic can improve data multicast in Apache Spark machine learning applications by as much as 4.0 times.

In the second part, the thesis tackles the problem of scheduling multicast data transfer in a high-bandwidth circuit switch. The scheduling algorithm adopts the approaches of multi-hopping and segmented transfer. It aims at minimizing the average demand completion time to deliver the most benefit to the applications. The algorithm exhibits up to 13.4 times improvement comparing with the state-of-the-art solution.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Data center network, data multicast, circuit switch, network traffic scheduling,
Citation

Sun, Xiaoye Steven. "High-Performance Data Multicast in Hybrid Data Center Networks." (2018) Diss., Rice University. https://hdl.handle.net/1911/105887.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page