A Resource-Aware Streaming-based Framework for Big Data Analysis

Date
2015-12-02
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

The ever growing body of digital data is challenging conventional analytical techniques in machine learning, computer vision, and signal processing. Traditional analytical methods have been mainly developed based on the assumption that designers can work with data within the confines of their own computing environment. The growth of big data, however, is changing that paradigm especially in scenarios where severe memory and computational resource constraints exist. This thesis aims at addressing major challenges in big data learning problem by devising a new customizable computing framework that holistically takes into account the data structure and underlying platform constraints. It targets a widely used class of analytical algorithms that model the data dependencies by iteratively updating a set of matrix parameters, including but not limited to most regression methods, expectation maximization, and stochastic optimizations, as well as the emerging deep learning techniques. The key to our approach is a customizable, streaming-based data projection methodology that adaptively transforms data into a new lower-dimensional embedding by simultaneously considering both data and hardware characteristics. It enables scalable data analysis and rapid prototyping of an arbitrary matrix-based learning task using a sparse-approximation of the collection that is constantly updated inline with the data arrival. Our work is supported by a set of user-friendly Application Programming Interfaces (APIs) that ensure automated adaptation of the proposed framework to various datasets and System on Chip (SoC) platforms including CPUs, GPUs, and FPGAs. Proof of concept evaluations using a variety of large contemporary datasets corroborate the practicability and scalability of our approach in resource-limited settings. For instance, our results demonstrate 50-fold improvement over the best known prior-art in terms of memory, energy, power, and runtime for training and execution of deep learning models in deployment of different sensing applications including indoor localization and speech recognition on constrained embedded platforms used in today's IoT enabled devices such as autonomous vehicles, robots, and smartphone.

Description
Degree
Master of Science
Type
Thesis
Keywords
Streaming model, Big data, Dense matrix, Low-rank approximation, HW/SW co-design, Deep Learning, Scalable machine learning
Citation

Darvish Rouhani, Bita. "A Resource-Aware Streaming-based Framework for Big Data Analysis." (2015) Master’s Thesis, Rice University. https://hdl.handle.net/1911/87764.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page