Browsing by Author "Hemkumar, Nariankadu D."
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item CAPE - VLSI Implementation of a Systolic Processor Array: Architecture, Design and Testing(1991-06-20) Hemkumar, Nariankadu D.; Kota, Kishore; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)The SVD is an important matrix decomposition in many real-time signal processing, image processing and robotics applications. Special-purpose processor arrays can achieve significant speed-up over conventioinal architectures through the use of efficient parallel algorithms. The Cordic Array Processor Element (CAPE) is a single chip VLSI implementation of a processor element for the Brent-Luk-VanLoan systolic array which computes the SVD of a real matrix. The array utilizes CORDIC (Co-ordinate Rotation Digital Computer) arithmetic to perform the vector rotations and inverse tangent calculations in hardware. A six-chip prototype of the processor has been implemented as TinyChips using the MOSIS fabricatioin service. Experience gained from designing the prototype helped in the design of integrated single chip version. The chip has been implemented on a 5600 x 6900ì die in a 2ì n-well scalable CMOS process.Item Efficient Complex Matrix Transformations with CORDIC(1993-06-20) Hemkumar, Nariankadu D.; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)Transformations of real and arbitrary 2 x 2 matrices are employed in parallel algorithms based on Jacobi-like precedures for matrix factorizations like the eigenvalue and the singular value decompositions. Cast in the primitives afforded by the CORDIC algorithms, significant speedup may be achieved in the performance of special-purpose processor array architectures. In this paper, we discuss the use of CORDIC for unitary two-sided 2x 2 matrix transformation. We emphasize integration of evaluation of parameters with application of transformations, using only the primitives afforded by CORDIC. Implementation alternatives are presented in both non-redundant and the redundant and on-line approaches to CORDIC.Item Efficient VLSI architectures for matrix factorizations(1994) Hemkumar, Nariankadu D.; Cavallaro, Joseph R.The SVD (Singular Value Decomposition) is a critical matrix factorization in many real-time computations from an application domain which includes signal processing and robotics; and complex data matrices are encountered in engineering practice. This thesis advocates the use of CORDIC (Coordinate Rotation Digital Computer) arithmetic for parallel computation of the SVD/eigenvalue decomposition of arbitrary complex/Hermitian matrices using Jacobi-like algorithms on processor arrays. The algorithms and architectures derive from extending the theory of Jacobi-like matrix factorizations to multi-step and inexact pivot (2 x 2) sub-matrix diagonalizations. Based on the former approach of multi-step diagonalization, and using a two-sided 2 x 2 unitary transformation amenable to CORDIC termed ${\cal Q}$ transformation, it is shown that an arbitrary complex 2 x 2 matrix may be diagonalized in at most two ${\cal Q}$ transformations while one ${\cal Q}$ transformation is sufficient to diagonalize a 2 x 2 Hermitian matrix. Inexact diagonalizations from the use of approximations to the desired transformations have been advocated in the context of Jacobi-like algorithms for reasons of efficiency. Through a unifying parameterization of approximations, efficacy of diagonalizations and expected convergence behavior, more efficient schemes than those reported in the literature are proposed for 2 x 2 real, real symmetric and Hermitian matrices. Convergence behavior of the different methods was obtained by implementing the algorithms on the CM5 using C$\sp\*$ and CMSSL. All proposed algorithms are cast in ${\cal Q}$ transformations and CORDIC-based VLSI processor architectures for implementation of the methods are detailed in (non-redundant) CORDIC and the redundant and on-line enhancements to CORDIC. The overhead for the evaluation of the unitary transformations in all cases is minimal, thus enabling the efficient evaluation and/or application, and pipelined execution of the two-sided 2 x 2 unitary transformations on the different systolic arrays proposed in the literature for SVD and eigenvalue decompositions.Item Jacobi-like Matrix Factorizations with CORDIC-based Inexact Diagonalizations(1994-06-20) Hemkumar, Nariankadu D.; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)We propose a CORDIC-based Jacobi-like method for parallel computation of the eigenvalues of Hermitian (and real symmetric) matrices and the SVD of real matrices using inexact diagonalizations. It is predicted on the fact that exact diagonalization is not necessary for convergence and the potential increase in computation time due to concomitant linear convergence may be offset by reducing the time to evaluate and appply the inexact diagonalizations. We also present results of experiments on the CM5 to determine convergence behavior.Item Redundant and Online CORDIC for Unitary Transformations(1994-08-20) Hemkumar, Nariankadu D.; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)Two-sided unitary transformations of arbitrary 2 x 2 matrices are needed in parallel algorithms based on Jacobi-like methods for eigenvalue and singulare value decompositions of complex matrices. This paper presents a two-sided unitary transformation structured to facilitate the integrated evaluation of parameters and application of the typically required tranformations using only the primitives afforded by CORDIC; thus enabling significant speedup in the computation of these transformations on special-purpose processor array architectures implementing Jacobi-like algorithms. We discuss implementation in (nonredundant) CORDIC to motivate and lead up to implementation in the redundant and on-line enhancements to CORDIC. Both variable and constant scale factor redundant (CFR) CORDIC approaches are detailed and it is shown that the transformations may be computed in 10n+o time, where n is the data precision in bits and o is a constant accounting for accumulated on-line delays. A more area-intesive approach uisng a novel on-line CORDIC encode angle summation/difference scheme reduces computation time to 6n+o. The area/time complexities involved in the various approaches are detailed.Item Simulation of Systolic Arrays on the Connection Machine(Simulation Councils, Inc., 1993-09) Hemkumar, Nariankadu D.; Cavallaro, Joseph R.; CMCThe use of a programming model which extends naturally from the underlying hardware, greatly eases the design and implementation of simulators, especially for those systems that resemble the hardware in the paradigm of computation. Given the characteristics of systolic arrays, SIMD computers which employ the data parallel programming model provide an ideal environment. In this paper, we present a systolic array simulator, a simulation tool written for the Connection Machine *(model CM2), a SIMD machine with powerful interprocessor communication capabilities. Especially as recent advances have automated the design, there is a need for a verification environment to prototype systolic arrays. Primarily a simulation tool, the systolic array simulator also helps identify inefficiencies and motivates optimal design prior to implementation in either custom VLSI or DSP systems. Currently, we are updating the tool to allow the simulation of dynamic array reconfiguration algorithms under transient and permanent fault conditions. The simulator is also being ported to the CM5.Item Simulation of Systolic Arrays On The Connection Machine(1993-09-20) Hemkumar, Nariankadu D.; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)The use of a programming model which extends naturally from the underlying hardware, greatly eases the design and implementation of simulators, especially for those systems that resemble the hardware in the paradigm of computation. Given the characteristics of systolic arrays, SIMS computers which employ the data parallel programming model provide an ideal environment. In this paper, we present a systolic array simulator, a simulation tool written for the Connection Machine (model CM2), a SIMD machine with powerful interproccessor communication capabilities. Especially as recent advances have automated the design, there is a need for a verification environment to prototype systolic arrays. Primarily a simulation tool, the systolic array simulator also helps identify inefficiencies and motivates optimal design proir to implementation in either custom VLSI or DSP systems. Currently, we are updating the tool to allow the simulation of dynamic array reconfiguration algorithms under transient and permanent fault conditions. The simulator is also being ported the CM5.Item A Systolic VLSI Architecture for Complex SVD(1992-05-20) Hemkumar, Nariankadu D.; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)A systolic algorithm for the SVD of arbitrary complex matrices, based on the cyclic Jacobi method with "parallel ordering" is presented. A novel two-step, two-sided unitary transformation scheme, tailored to the use of CORDIC algorithms for high speed arithmetic, is employed to diagonalize a complex 2x2 matrix. Architecturally, the complex SVD array is modeled on the Brent-Luk-VanLoan array for real SVD. An expandable array of O(n²) complex 2x2 matrix processors computes the SVD of an nxn matrix in O(n log n) time. A CORDIC architecture for the complex 2x2 processor with an area complexity twice that of a real 2x2 processor is proposed. Computation time for the complex SVD array is less than three times that for a real SVD array with a similar CORDIC based implementation.