Marrying Application-Level Opportunities with Algorithm-Hardware Co-Design towards Ubiquitous Edge Intelligence

Zhao, Yang

Marrying Application-Level Opportunities with Algorithm-Hardware Co-Design towards Ubiquitous Edge Intelligence

dc.contributor.advisor	Lin, Yingyan	en_US
dc.creator	Zhao, Yang	en_US
dc.date.accessioned	2023-06-15T18:21:27Z	en_US
dc.date.created	2023-05	en_US
dc.date.issued	2023-04-21	en_US
dc.date.submitted	May 2023	en_US
dc.date.updated	2023-06-15T18:21:27Z	en_US
dc.description.abstract	Artificial Intelligence (AI) algorithms, especially Deep Neural Networks (DNNs), recently achieved record-breaking performance (i.e., task accuracy) in a wide range of applications. This has motivated a growing demand for bringing powerful AI-powered functionalities into edge devices, such as Virtual Reality/Augmented Reality (VR/AR) and medical devices, towards ubiquitous edge intelligence. On the other hand, the powerful performance of AI algorithms comes with much increased computational complexity and memory storage requirements, which stand at odd with the limited compute/storage resources on edge devices. To close the aforementioned gap for enabling more extensive AI-powered edge intelligence, we advocate harmonizing AI algorithms and dedicated accelerators via algorithm-accelerator co-design and leveraging application-level opportunities to minimize redundant computations in the processing pipeline. First, to tackle the efficiency bottleneck caused by the required massive random-access memory (DRAM) accesses when accelerating DNNs, we propose an algorithm-accelerator co-design technique called SmartExchange to trade higher-cost memory storage/accesses for lower-cost computations, for boosting the acceleration efficiency of both DNN inference and training. In particular, on the algorithm level, we enforce a hardware-friendly DNN weight structure, where only a small basis matrix and a sparse and readily-quantized coefficient matrix are needed to be stored for each layer and the remaining majority of weights can be recovered from lower-cost computations. On the hardware level, we further design a dedicated accelerator to leverage the SmartExchange-enforced algorithm structure for improving both the energy efficiency and processing latency of acceleration. Second, motivated by the promising results achieved by the above algorithm-accelerator co-design technique SmartExchange, we explore and develop dedicated algorithm-accelerator co-design techniques for two real-world applications, which are further empowered with application-level opportunities to maximize the achievable efficiency. In particular, we consider two representative and increasingly demanded AI-powered intelligent applications, one is eye tracking on VR/AR devices which is to estimate the gaze directions of human eyes and the other is cardiac detection on medical implants which is to perform intracardiac electrogram (EGM) to electrocardiogram (ECG) conversion (i.e., EGM-to-ECG conversion) on pacemakers. Among these applications, we find that there consistently exist application-level opportunities to be leveraged for largely reducing the computation/data movement redundancy within the processing pipeline. Therefore, we develop a tailored processing pipeline for each application and then pair it with dedicated algorithm-accelerator co-design techniques to further boost the overall system efficiency while maintaining the task performance, as elaborated below: For the eye tracking application, we propose a predict-then-focus pipeline that first extracts region-of-interests (ROIs), which is only 24% (average) of the original eye images for gaze estimation, to reduce computational redundancy. Additionally, the temporal correlation of eyes across frames is leveraged so that only 5% of the frames require ROIs adjustment over time. On top of those, we develop a dedicated accelerator and integrate both the algorithm and accelerator into a real hardware prototype system, dubbed i-FlatCam, consisting of a lensless camera and a chip prototype fabricated in a 28nm CMOS technology for validation. Real-hardware measurements show that the i-FlatCam system is the first to simultaneously meets all three requirements of eye tracking required by next-generation AR/VR devices. After that, we take another big leap towards accelerating an eye segmentation-involved pipeline for eye tracking towards more general eye tracking in AR/VR, where the segmentation result can enable more downstream tasks in addition to eye tracking. The resulting system is called EyeCoD and is validated with a multi-chip hardware prototype setting. For the EGM-to-ECG conversion application, we propose an application-aware processing pipeline where a precise and more complex conversion algorithm is only incurred in instant response to the detected anomaly (i.e., arrhythmia) and a coarse conversion is activated otherwise to avoid unnecessary computations. Furthermore, we develop a dedicated accelerator called e-G2C which is tailored for the above processing pipeline to further boost energy efficiency. For evaluation, the e-G2C processor is fabricated in a 28nm CMOS technology and achieves 0.14-8.31 μJ/inference energy efficiency outperforming prior arts under similar complexity, enabling real-time detection/conversion, and promising possibly life-critical interventions.	en_US
dc.embargo.lift	2023-11-01	en_US
dc.embargo.terms	2023-11-01	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Zhao, Yang. "Marrying Application-Level Opportunities with Algorithm-Hardware Co-Design towards Ubiquitous Edge Intelligence." (2023) Diss., Rice University. <a href="https://hdl.handle.net/1911/114913">https://hdl.handle.net/1911/114913</a>.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/114913	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Artificial Intelligence	en_US
dc.subject	Deep Neural Network	en_US
dc.subject	Hardware acceleration	en_US
dc.subject	Algorithm-hardware co-design	en_US
dc.subject	Integrated Circuit	en_US
dc.title	Marrying Application-Level Opportunities with Algorithm-Hardware Co-Design towards Ubiquitous Edge Intelligence	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Electrical and Computer Engineering	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ZHAO-DOCUMENT-2023.pdf
Size:: 10.87 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.6 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations