A Data and Platform-Aware Framework For Large-Scale Machine Learning

dc.contributor.advisorKoushanfar, Farinazen_US
dc.contributor.committeeMemberAazhang, Behnaamen_US
dc.contributor.committeeMemberBaraniuk, Richarden_US
dc.contributor.committeeMemberJermaine, Christopheren_US
dc.creatorMirhoseini, Azaliaen_US
dc.date.accessioned2016-01-27T22:46:20Zen_US
dc.date.available2016-01-27T22:46:20Zen_US
dc.date.created2015-05en_US
dc.date.issued2015-04-24en_US
dc.date.submittedMay 2015en_US
dc.date.updated2016-01-27T22:46:20Zen_US
dc.description.abstractThis thesis introduces a novel framework for execution of a broad class of iterative machine learning algorithms on massive and dense (non-sparse) datasets. Several classes of critical and fast-growing data, including image and video content, contain dense dependencies. Current pursuits are overwhelmed by the excessive computation, memory access, and inter-processor communication overhead incurred by processing dense data. On the one hand, solutions that employ data-aware processing techniques produce transformations that are oblivious to the overhead created on the underlying computing platform. On the other hand, solutions that leverage platform-aware approaches do not exploit the non-apparent data geometry. My work is the first to develop a comprehensive data- and platform-aware solution that provably optimizes the cost (in terms of runtime, energy, power, and memory usage) of iterative learning analysis on dense data. My solution is founded on a novel tunable data transformation methodology that can be customized with respect to the underlying computing resources and constraints. My key contributions include: (i) introducing a scalable and parametric data transformation methodology that leverages coarse-grained parallelism in the data to create versatile and tunable data representations, (ii) developing automated methods for quantifying platform-specific computing costs in distributed settings, (iii) devising optimally-bounded partitioning and distributed flow scheduling techniques for running iterative updates on dense correlation matrices, (iv) devising methods that enable transforming and learning on streaming dense data, and (v) providing user-friendly open-source APIs that facilitate adoption of my solution on multiple platforms including (multi-core and many-core) CPUs and FPGAs. Several learning algorithms such as regularized regression, cone optimization, and power iteration can be readily solved using my APIs. My solutions are evaluated on a number of learning applications including image classification, super-resolution, and denoising. I perform experiments on various real-world datasets with up to 5 billion non-zeros on a range of computing platforms including Intel i7 CPUs, Amazon EC2, IBM iDataPlex, and Xilinx Virtex-6 FPGAs. I demonstrate that my framework can achieve up to 2 orders of magnitude performance improvement in comparison with current state-of-the-art solutions.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationMirhoseini, Azalia. "A Data and Platform-Aware Framework For Large-Scale Machine Learning." (2015) Diss., Rice University. <a href="https://hdl.handle.net/1911/88212">https://hdl.handle.net/1911/88212</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/88212en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectBig Dataen_US
dc.subjectMachine Learningen_US
dc.subjectData-Awareen_US
dc.subjectPlatform-Awareen_US
dc.subjectDistributed optimizationen_US
dc.subjectDense Dataen_US
dc.titleA Data and Platform-Aware Framework For Large-Scale Machine Learningen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentElectrical and Computer Engineeringen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MIRHOSEINI-DOCUMENT-2015.pdf
Size:
2.33 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.85 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: