Browsing by Author "Mirhoseini, Azalia"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item A Data and Platform-Aware Framework For Large-Scale Machine Learning(2015-04-24) Mirhoseini, Azalia; Koushanfar, Farinaz; Aazhang, Behnaam; Baraniuk, Richard; Jermaine, ChristopherThis thesis introduces a novel framework for execution of a broad class of iterative machine learning algorithms on massive and dense (non-sparse) datasets. Several classes of critical and fast-growing data, including image and video content, contain dense dependencies. Current pursuits are overwhelmed by the excessive computation, memory access, and inter-processor communication overhead incurred by processing dense data. On the one hand, solutions that employ data-aware processing techniques produce transformations that are oblivious to the overhead created on the underlying computing platform. On the other hand, solutions that leverage platform-aware approaches do not exploit the non-apparent data geometry. My work is the first to develop a comprehensive data- and platform-aware solution that provably optimizes the cost (in terms of runtime, energy, power, and memory usage) of iterative learning analysis on dense data. My solution is founded on a novel tunable data transformation methodology that can be customized with respect to the underlying computing resources and constraints. My key contributions include: (i) introducing a scalable and parametric data transformation methodology that leverages coarse-grained parallelism in the data to create versatile and tunable data representations, (ii) developing automated methods for quantifying platform-specific computing costs in distributed settings, (iii) devising optimally-bounded partitioning and distributed flow scheduling techniques for running iterative updates on dense correlation matrices, (iv) devising methods that enable transforming and learning on streaming dense data, and (v) providing user-friendly open-source APIs that facilitate adoption of my solution on multiple platforms including (multi-core and many-core) CPUs and FPGAs. Several learning algorithms such as regularized regression, cone optimization, and power iteration can be readily solved using my APIs. My solutions are evaluated on a number of learning applications including image classification, super-resolution, and denoising. I perform experiments on various real-world datasets with up to 5 billion non-zeros on a range of computing platforms including Intel i7 CPUs, Amazon EC2, IBM iDataPlex, and Xilinx Virtex-6 FPGAs. I demonstrate that my framework can achieve up to 2 orders of magnitude performance improvement in comparison with current state-of-the-art solutions.Item A Unified Framework for Multimodal IC Trojan Detection(2010-02-02) Alkabani, Yousra; Koushanfar, Farinaz; Mirhoseini, AzaliaThis paper presents a unified formal framework for integrated circuits (IC) Trojan detection that can simultaneously employ multiple noninvasive measurement types. Hardware Trojans refer to modifications, alterations, or insertions to the original IC for adversarial purposes. The new framework formally defines the IC Trojan detection for each measurement type as an optimization problem and discusses the complexity. A formulation of the problem that is applicable to a large class of Trojan detection problems and is submodular is devised. Based on the objective function properties, an efficient Trojan detection method with strong approximation and optimality guarantees is introduced. Signal processing methods for calibrating the impact of inter-chip and intra-chip correlations are presented. We define a new sensitivity metric which formally quantifies the impact of modifications to each gate on the Trojan detection. Using the new metric, we compare the Trojan detection capability of the different measurement types for static (quiescent) current, dynamic (transient) current, and timing (delay) measurements. We propose a number of methods for combining the detections of the different measurement types and show how the sensitivity results can be used for a systematic combining of the detection results. Experimental evaluations on benchmark designs reveal the low-overhead and effectiveness of the new Trojan detection framework and provides a comparison of different detection combining methods.Item Coding for Phase Change Memory Performance Optimization(2012-09-05) Mirhoseini, Azalia; Koushanfar, Farinaz; Baraniuk, Richard G.; Aazhang, BehnaamOver the past several decades, memory technologies have exploited continual scaling of CMOS to drastically improve performance and cost. Unfortunately, charge-based memories become unreliable beyond 20 nm feature sizes. A promising alternative is Phase-Change-Memory (PCM) which leverages scalable resistive thermal mechanisms. To realize PCM's potential, a number of challenges, including the limited wear-endurance and costly writes, need to be addressed. This thesis introduces novel methodologies for encoding data on PCM which exploit asymmetries in read/write performance to minimize memory's wear/energy consumption. First, we map the problem to a distance-based graph clustering problem and prove it is NP-hard. Next, we propose two different approaches: an optimal solution based on Integer-Linear-Programming, and an approximately-optimal solution based on Dynamic-Programming. Our methods target both single-level and multi-level cell PCM and provide further optimizations for stochastically-distributed data. We devise a low overhead hardware architecture for the encoder. Evaluations demonstrate significant performance gains of our framework.Item Partitioned machine learning architecture(2024-03-05) Rouhani, Bita Darvish; Mirhoseini, Azalia; Koushanfar, Farinaz; Rice University; United States Patent and Trademark OfficeA system may include a processor and a memory. The memory may include program code that provides operations when executed by the processor. The operations may include: partitioning, based at least on a resource constraint of a platform, a global machine learning model into a plurality of local machine learning models; transforming training data to at least conform to the resource constraint of the platform; and training the global machine learning model by at least processing, at the platform, the transformed training data with a first of the plurality of local machine learning models.