Browsing by Author "Brogioli, Michael C."
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Compiler Driven Architecture Design Space Exploration for DSP Workloads: A Study in Software Programmability Versus Hardware Acceleration(IEEE, 2009-11-01) Brogioli, Michael C.; Cavallaro, Joseph R.; Center for Multimedia CommunicationWireless communications and video kernels contain vast instruction and data level parallelism that can far outstrip programmable high performance DSPs. Hardware acceleration of these bottlenecks is commonly done at the cost of software flexibility. Many vendors, however, view software as intellectual property and prefer a software solution that is a proprietary implementation. The paper uses a research compiler for architectural design space exploration to present comparisons between compiler generated scalable software programmable DSP architectures versus hardware acceleration implementations. It shows that scaled up compiler generated software programmable DSP architectures can be attractive alternatives to non-programmable hardware acceleration.Item Design and Analysis of Heterogeneous DSP/FPGA Based Architectures for 3GPP Wireless Systems(IEEE, 2006-04-01) Brogioli, Michael C.; Gadhiok, Manik; Cavallaro, Joseph R.; Center for Multimedia CommunicationThis paper shows how iterative hardware/software partitioning in heterogeneous DSP/FPGA based embedded systems can be utilized to achieve real-time deadlines of modern 3GPP wireless equalization workloads. By utilizing a well defined set of application partitioning criteria in tandem with SOC simulation tools, we are able to show a greater than six fold improvement in application performance and ultimately meet, and even exceed real-time data processing deadlines.Item Dynamically reconfigurable data caches in low-power computing(2003) Brogioli, Michael C.; Cooper, Keith D.In order to curb microprocessor power consumption, we propose an L1 data cache which can be reconfigured dynamically at runtime according to the cache requirements of a given application. A two phase approach is used involving both compile time information, and the runtime monitoring of program performance. The compiler predicts L1 data cache requirements of loop nests in the input program, and instructs the hardware on how much L1 data cache to enable during a loop nest's execution. For regions of the program not analyzable at compile time, the hardware itself monitors program performance and reconfigures the L1 data cache so as to maintain cache performance while minimizing cache power consumption. In addition to this, we provide a study of data reuses inside loop nests of the SPEC CPU2000 and Mediabench benchmarks. The sensitivity of data reuses to L1 data cache associativity is analyzed to illustrated the potential power savings a reconfigurable L1 data cache can achieve.Item Reconfigurable Architectures for Wireless Systems: Design Exploration and Integration Challenges(WWRF, 2004-11-01) Cavallaro, Joseph R.; Brogioli, Michael C.; de Baynast, Alexandre; Radosavljevic, Predrag; Center for Multimedia CommunicationMobile devices are severely power and area limited due to battery capacity and system size. In many of these example systems, advanced features require computationally complex signal processing on high-speed data streams for enhanced networking capabilities. Thus, mapping high-level communication and networking algorithms to system architectures is a complex and challenging procedure. An important challenge is to characterize the area, time, and power requirements of these embedded system modules and to use this information effectively to determine the architecture of programmable, reconfigurable, and fixed-function modules. In this paper, we will focus on application examples in wireless networking which highlight these challenges in reconfigurable systems integration.Item Reconfigurable heterogeneous DSP/FPGA based embedded architectures for numerically intensive computing workloads(2007) Brogioli, Michael C.; Cavallaro, Joseph R.Telecommunications and multimedia form a vast segment of the embedded systems market. Variations in standards coupled with the desire for software programmability often result in software based implementations executing on DSP cores. With the advent of data intensive media and communications workloads, computational demands of the DSP are ever increasing. Despite increases in clock rates, the computational demands of many wireless and multimedia video kernels far exceeds the available pipeline arithmetic and logic unit (ALU) resources of todays DSP devices. This thesis presents a hardware/software co-design methodology for partitioning real-time embedded multimedia applications between software programmable DSPs and hardware based FPGA coprocessors. Using a strict set of guidelines, input applications are partitioned between software executing on a programmable DSP and hardware based FPGA implementation. This methodology is applied to channel estimation firmware in 3.5G wireless receivers, as well as software based H.263 video decoders. These heterogeneous systems are prototyped using a custom simulation environment created for these studies, which models bit true cycle accurate heterogeneous embedded architectures. By partitioning performance critical kernels from software on the DSP to FPGA based loosely coupled coprocessors, significant performance gains over what is possible with modern DSP architectures are shown. This thesis also investigates the instruction and data level parallelism in modern digital signal processing and multimedia workloads, and presents a retargetable compiler infrastructure for multi-clustered VLIW style digital signal processor architectures. By recompiling existing workloads, the thesis compares the performance of aggressive hardware/software partitioning between modern DSP cores, and loosely coupled FPGA based coprocessors, and the performance of massively multi-clustered VLIW style architectures. The compiler infrastructure allows existing DSP kernels to be retargeted for user defined machine definitions. In doing this, the thesis shows that increased hardware parallelism within the DSP core can yield significant performance gains, as well as the amount of hardware necessary to compete with FPGA based performance. In conclusion, the thesis advocates application specific DSP design with increased hardware parallelism for modern signal processing and multimedia workloads, as well as loosely coupled hardware based coprocessors for truly high performance computing in these domains.