A Data and Platform-Aware Framework For Large-Scale Machine Learning

dc.contributor.advisorKoushanfar, Farinaz
dc.contributor.committeeMemberAazhang, Behnaam
dc.contributor.committeeMemberBaraniuk, Richard
dc.contributor.committeeMemberJermaine, Christopher
dc.creatorMirhoseini, Azalia
dc.date.accessioned2016-01-27T22:46:20Z
dc.date.available2016-01-27T22:46:20Z
dc.date.created2015-05
dc.date.issued2015-04-24
dc.date.submittedMay 2015
dc.date.updated2016-01-27T22:46:20Z
dc.description.abstractThis thesis introduces a novel framework for execution of a broad class of iterative machine learning algorithms on massive and dense (non-sparse) datasets. Several classes of critical and fast-growing data, including image and video content, contain dense dependencies. Current pursuits are overwhelmed by the excessive computation, memory access, and inter-processor communication overhead incurred by processing dense data. On the one hand, solutions that employ data-aware processing techniques produce transformations that are oblivious to the overhead created on the underlying computing platform. On the other hand, solutions that leverage platform-aware approaches do not exploit the non-apparent data geometry. My work is the first to develop a comprehensive data- and platform-aware solution that provably optimizes the cost (in terms of runtime, energy, power, and memory usage) of iterative learning analysis on dense data. My solution is founded on a novel tunable data transformation methodology that can be customized with respect to the underlying computing resources and constraints. My key contributions include: (i) introducing a scalable and parametric data transformation methodology that leverages coarse-grained parallelism in the data to create versatile and tunable data representations, (ii) developing automated methods for quantifying platform-specific computing costs in distributed settings, (iii) devising optimally-bounded partitioning and distributed flow scheduling techniques for running iterative updates on dense correlation matrices, (iv) devising methods that enable transforming and learning on streaming dense data, and (v) providing user-friendly open-source APIs that facilitate adoption of my solution on multiple platforms including (multi-core and many-core) CPUs and FPGAs. Several learning algorithms such as regularized regression, cone optimization, and power iteration can be readily solved using my APIs. My solutions are evaluated on a number of learning applications including image classification, super-resolution, and denoising. I perform experiments on various real-world datasets with up to 5 billion non-zeros on a range of computing platforms including Intel i7 CPUs, Amazon EC2, IBM iDataPlex, and Xilinx Virtex-6 FPGAs. I demonstrate that my framework can achieve up to 2 orders of magnitude performance improvement in comparison with current state-of-the-art solutions.
dc.format.mimetypeapplication/pdf
dc.identifier.citationMirhoseini, Azalia. "A Data and Platform-Aware Framework For Large-Scale Machine Learning." (2015) Diss., Rice University. <a href="https://hdl.handle.net/1911/88212">https://hdl.handle.net/1911/88212</a>.
dc.identifier.urihttps://hdl.handle.net/1911/88212
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectBig Data
dc.subjectMachine Learning
dc.subjectData-Aware
dc.subjectPlatform-Aware
dc.subjectDistributed optimization
dc.subjectDense Data
dc.titleA Data and Platform-Aware Framework For Large-Scale Machine Learning
dc.typeThesis
dc.type.materialText
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MIRHOSEINI-DOCUMENT-2015.pdf
Size:
2.33 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.85 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: