Browsing by Author "McCain, Dennis"
Now showing 1 - 18 of 18
Results Per Page
Sort Options
Item Compact Hardware Accelerator for Functional Verification and Rapid Prototyping of 4G Wireless Communication Systems(2004-11-01) Guo, Yuanbin; McCain, Dennis; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we propose an FPGA-based hardware accelerator platform with Xilinx Virtex-II V3000 in a compact PCMCIA form factor. By partitioning the complex algorithms in the 4G simulator to the hardware accelerator, we apply an efficient Catapult-C methodology to quickly evaluate the area/speed tradeoffs and rapidly schedule synthesizable RTL models for implementation. The simulation time is accelerated by 100£ for a QRD-M algorithm. This not only enables much faster verification in the 4G standard environment, but also provides software/hardware co-design and rapid prototyping of the core algorithm in a realistic fixed-point platform.Item Displacement MIMO Kalman equalizer architecture for CDMA downlink in fast fading channels(2005-07-01) Guo, Yuanbin; Zhang, Jianzhong (Charlie); McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we explore the displacement structure in a Kalman equalizer for MIMO-CDMA downlink. A streamlined MIMO Kalman equalizer architecture is proposed to extract the commonality in the data path by exploiting the displacement structure of the transition matrix and the block-Toeplitz structure of the channel matrix. Numerical matrix multiplications with O(F^3) complexity are eliminated by simple data loading process. Utilizing the block Toeplitz structure of the channel matrix, an FFT-based acceleration is proposed to avoid direct matrix multiplications in the time domain. Finally, an iterative Conjugate-Gradient based algorithm is proposed to avoid the inversion of the innovation correlation matrix in Kalman gain calculation. The proposed architecture not only reduces the numerical complexity to O(F log2 F) per chip, but also facilitates the parallel and pipelined VLSI implementation for real-time processing.Item Displacement MIMO Kalman Equalizer for CDMA Downlink in Fast Fading Channels(2005-11-01) Guo, Yuanbin; Zhang, Jianzhong (Charlie); McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, a streamlined MIMO Kalman equalizer architecture is proposed to extract the commonality in the data path by jointly considering the displacement structure of the transition matrix and the block-Toeplitz structure of the channel matrix. Finally, an iterative Conjugate-Gradient based algorithm is proposed to avoid the inverse of the Hermitian symmetric innovation correlation matrix in Kalman gain processor. The proposed architecture not only reduces the numerical complexity to O(F log F) per chip, but also facilitates the parallel and pipelined VLSI implementation in real-time processing.Item An Efficient Circulant MIMO Equalizer for CDMA Downlink: Algorithm and VLSI Architecture(Hindawi Publishing Corporation, 2006-02-01) Guo, Yuanbin; Zhang, Jianzhong; McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia CommunicationWe present an efficient circulant approximation-based MIMO equalizer architecture for the CDMA downlink. This reduces the direct matrix inverse (DMI) of size (NF×NF) with O((NF)3) complexity to some FFT operations with O(NF log2(F)) complexity and the inverse of some (N×N) submatrices.We then propose parallel and pipelined VLSI architectures with Hermitian optimization and reduced-state FFT for further complexity optimization. Generic VLSI architectures are derived for the (4×4) high-order receiver from partitioned (2 × 2) submatrices. This leads to more parallel VLSI design with 3× further complexity reduction. Comparative study with both the conjugate-gradient and DMI algorithms shows very promising performance/complexity tradeoff. VLSI design space in terms of area/time efficiency is explored extensively for layered parallelism and pipelining with a Catapult C high-level-synthesis methodology.Item An Efficient Circulant MIMO Equalizer for CDMA Downlink: Algorithm and VLSI Architecture(2005-12-01) Guo, Yuanbin; Zhang, Jianzhong (Charlie); McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we present an efficient circulant approximation based MIMO equalizer architecture for the CDMA downlink. This reduces the Direct-Matrix-Inverse (DMI) of size (NF x NF) with O((NF)³) complexity to some FFT operations with O(NF log2(F)) complexity and the inverse of some (N x N) sub-matrices. We then propose parallel and pipelined VLSI architectures with Hermitian optimization and reduced-state FFT for further complexity optimization. Generic VLSI architectures are derived for the (4 x 4) high-order receiver from partitioned (2 x 2) sub-matrices. This leads to more parallel VLSI design with 3x further complexity reduction. Comparative study with both the Conjugate-Gradient and DMI algorithms shows very promising performance/complexity tradeoff. VLSI design space in terms of area/time efficiency is explored extensively for layered parallelism and pipelining with a Catapult C High-Level-Synthesis methodology.Item Efficient MIMO equalization for downlink multi-code CDMA: complexity optimization and comparative study(2004-11-01) Guo, Yuanbin; Zhang, Jianzhong (Charlie); McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we present an efficient LMMSE chip equalizer to suppress the interference caused by the multipath fading channel in the MIMO multi-code CDMA downlink. The block-Toeplitz structure in the correlation matrix is approximated with a block circulant matrix. An FFT-based algorithm is applied to avoid the Direct-Matrix-Inverse (DMI) in the system equation. Hermitian optimization is proposed to further reduce the complexity. A comparative study in both performance and complexity with the Conjugate-Gradient (CG) algorithm is then presented. The simulation shows very promising results for the FFT-based equalizer compared with both the DMI and CG algorithms.Item FFT-Accelerated Iterative MIMO Chip Equalizer Architecture For CDMA Downlink(2005-03-01) Guo, Yuanbin; McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we present a novel FFT-accelerated iterative Linear MMSE chip equalizer in the MIMO CDMA downlink receiver. The reversed form time-domain matrix multiplication in the Conjugate Gradient iteration is accelerated by an equivalent frequency-domain circular convolution with FFT-based "overlap-save" architecture. The iteration rapidly refines a crude initial approximation to the actual final equalizer taps. This avoids the Direct-Matrix-Inverse with O((NL)³) complexity, and reduces the standard CG complexity from O((NL)²) to O(NLlog2(NL)). Simulation demonstrates strong numerical stability and promising performance/complexity tradeoff, especially for very long channels.Item Hermitian Optimization and Scalable VLSI Architecture for Circulant Approximated MIMO Equalizer in CDMA Downlink(2005-09-01) Guo, Yuanbin; McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we propose a parallel and pipelined VLSI architecture for a circulant approximated equalizer for the MIMOCDMA systems. The FFT-based tap solver reduces the Direct-Matrix-Inverse of the size (NF x NF) to the inverse of O(N) sub-matrices of the size (N x N). Hermitian optimization and tree pruning is proposed to reduce the number and complexity of the FFTs. A divide-andconquer method partitions the 4£4 sub-matrices into 2x2 sub-matrices and simplifies the inverse of sub-matrices. Generic VLSI architecture is derived to eliminate the redundancies in the complex operations. Multiple level parallelism and pipelining is investigated with a Catapult C High-Level-Synthesis (HLS) methodology. This leads to efficient VLSI architectures with 3x further complexity reduction. The scalable VLSI architectures are prototyped with the Xilinx FPGAs and achieve area/time efficiency.Item Low Complexity System-On-Chip Architectures Of Optimal Parallel-Residue-Compensation In CDMA Systems(2004-05-01) Guo, Yuanbin; McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we propose a novel multi-stage Parallel-Residue-Compensation (PRC) receiver architecture for enhanced suppression of the MAI in CDMA systems. We extract the commonality to avoid the direct Interference Cancellation and reduce the algorithm complexity from O(K²N) to O(KN). In the second part, scalable VLSI architectures are implemented in a FPGA prototyping system with an efficient Precision-C System-on-Chip (SOC) design methodology. Hardware efficiency is achieved by investigating multi-level parallelism and pipelines. The design of Sum-Sub-MUX Unit (SMU) combinational logic avoids the usage of dedicated multipliers with at least 10X saving in hardware resources. The most area/timing efficient design only uses area similar to the most area constraint architecture but gives at least 4X speedup over a conventional design.Item Low Power VLSI Architecture for Adaptive MAI Suppression in CDMA Using Multi-stage Convergence Masking Vector(2005-09-01) Guo, Yuanbin; McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we propose a novel low power and low complexity multi-stage Parallel-Residue-Compensation (PRC) architecture for enhanced MAI suppression in the CDMA systems. The accuracy of the interference cancellation is improved with a set of weights computed from an adaptive Normalized Least-Mean-Square (NLMS) algorithm. The physical meaning of the complete versus weighted interference cancellation is applied to clip the weights above a certain threshold. Multistage Convergence-Masking-Vector (CMV) is then proposed to combine with the clock gating as a dynamic power management scheme in the VLSI receiver architecture. This reduces the dynamic power consumption in the VLSI architecture by up to 90% with a negligible performance loss.Item Rapid Industrial Prototyping and Scheduling of 3G/4G SoC Architectures with HLS Methodology(2005-12-01) Guo, Yuanbin; McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we present a Catapult C/C++ based methodology that integrates key technologies for high-level VLSI modelling of 3G/4G wireless systems to enable extensive time/area tradeoff study. A Catapult C/C++ based architecture scheduler transfers the major workload to the algorithmic C/C++ fixedpoint design. Prototyping experiences are presented to explore the VLSI design space extensively for various types of computational intensive algorithms in the HSDPA, MIMO-CDMA and MIMOOFDM systems, such as synchronization, MIMO equalizer and the QRD-M detector. Extensive time/area tradeoff study is enabled with different architecture and resource constraints in a short design cycle. The industrial design experience demonstrates significant improvement in architecture efficiency and productivity, which enables truly rapid prototyping for the 3G and beyond wireless systems.Item Rapid Industrial Prototyping and SoC Design of 3G/4G Wireless Systems Using an HLS Methodology(Hindawi Publishing Corporation, 2006-07-01) Guo, Yuanbin; McCain, Dennis; Cavallaro, Joseph R.; Takach, Andres; Center for Multimedia CommunicationMany very-high-complexity signal processing algorithms are required in future wireless systems, giving tremendous challenges to real-time implementations. In this paper, we present our industrial rapid prototyping experiences on 3G/4G wireless systems using advanced signal processing algorithms in MIMO-CDMA and MIMO-OFDM systems. Core system design issues are studied and advanced receiver algorithms suitable for implementation are proposed for synchronization, MIMO equalization, and detection. We then present VLSI-oriented complexity reduction schemes and demonstrate how to interact these high-complexity algorithms with an HLS-based methodology for extensive design space exploration. This is achieved by abstracting the main effort from hardware iterations to the algorithmic C/C++ fixed-point design. We also analyze the advantages and limitations of the methodology. Our industrial design experience demonstrates that it is possible to enable an extensive architectural analysis in a short-time frame using HLS methodology, which significantly shortens the time to market for wireless systems.Item Rapid Scheduling of Efficient FPGA Architectures for Next-Generation HSDPA Wireless System Using Precision C Synthesizer(2003-06-20) Guo, Yuanbin; McCain, Dennis; Xu, Gang; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, an efficient design flow integrating Mentor Graphics Precesion C and HDL designer is derived. In this hybrid prototyping environment, efficient FPGA architec-tures are scheduled rapidly with specific hardware re-source/timing/architecture constraints from C/C++ level modeling by allocating the usage of functional units and real-time requirements. Using this methodology, a system-on-chip architecture for the next-generation CDMA system, i.e., HSDPA system, is prototyped rapidly. Advanced algo-rithms including chip-level equalizer, turbo codec and clock tracking, frequency offset compensation, are scheduled with Precesion C. A relatively more area/timing efficient RTL architecture is generated automatically and integrated with other design blocks in HDL designer, then implemented efficiently in Xilinx FPGAs. This new design flow demon-strates productivity improvement of 2X for typical wireless communication algorithms and reduces the risk of product development dramatically.Item Reduced QRD-M Detector in MIMO-OFDM Systems With Partial and Embedded Sorting(IEEE ComSoc, 2005-11-01) Guo, Yuanbin; McCain, Dennis; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper,we present a reduced QRD-M matrix symbol detector in MIMO-OFDM systems. The QRD-M algorithm first decomposes the MIMO channel matrix into upper triangular matrix and applies a limited tree search to approximate the maximum-likelihood detector. The metric update is reduced from O(4.5MC) to O(1.5MC) by extracting the commonality. We then propose a partial quick-sort procedure and an embedded insert sort to achieve almost linear sorting. In the second part, we present efficient VLSI architectures utilizing the parallelism between subcarriers and design the pipelining in the multi-stage MIMO processing. The real-time architecture is implemented in a FPGA-based hardware accelerator with compact form factor, which achieves up to 100à speedup in the simulation time.Item Scalable FPGA Architectures for LMMSE-based SIMO Chip Equalizer in HSDPA Downlink(IEEE, 2003-11-01) Guo, Yuanbin; McCain, Dennis; Zhang, Jianzhong (Charlie); Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, scalable FPGA architectures for the LMMSE-based chip-level equalizer in HSDPA downlink re-ceivers are studied. An FFT-based algorithm is applied to avoid the direct matrix inverse by utilizing the block-Toeplitz structure of the correlation matrix. A Pipelined-Multiplexing-Scheduler (PMS) is designed in the front-end to achieve scalable computation of the correlation coefficients. Very efficient VLSI architectures are designed by investigat-ing the multiple level parallelism and pipelining with a Precision-C based High-Level-Synthesis (HLS) design methodology. A 1à 2 Single-Input-Multiple-Output (SIMO) downlink receiver is designed and integrated in the HSDPA prototype system with Xilinx Virtex-II XC2V6000 FPGAs. The design demonstrates more area/time efficiency by achieving the best tradeoffs between the usage of functional units and real-time requirements.Item Structured Iterative and Circulant MIMO Chip Equalizer Architectures with FFT-acceleration for CDMA Systems(2005-07-01) Guo, Yuanbin; McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we propose a class of novel structured linear MMSE chip equalizer architectures for the MIMO CDMA systems using FFT-accelerations. First, a Conjugate Gradient (CG) algorithm is applied to avoid the Direct Matrix Inverse (DMI), which has O((NF)^3) complexity. By utilizing a revered form block-Toeplitz structure, the matrix multiplication in the CG iteration is accelerated by an equivalent frequencydomain FFT-based â overlap-saveâ architecture. The iteration rapidly refines a crude initial approximation to the actual final equalizer taps and significantly reduces complexity from O((NF)^2) to O(NF log2(F)). Secondly, we propose a circulant architecture which also utilizes FFT-based acceleration by approximating the DMI with a block-circulant structure. An extensive comparative analysis in performance, numerical stability and complexity demonstrates promising performance/complexity tradeoff, especially for very long channels. Both algorithms not only reduce the complexity dramatically, but also provide unified parallel and pipelined structures, which is essential for practical real-time VLSI implementation of MIMO systems.Item Structured Parallel Architecture for Displacement MIMO Kalman Equalizer in CDMA Systems(IEEE, 2007-02-01) Guo, Yuanbin; Zhang, Jianzhong; McCain, Dennis; Cavallaro, Joseph R.; Center for Multimedia CommunicationA reduced complexity MIMO Kalman equalizer architecture is proposed in this brief by jointly considering the displacement structure and the block-Toeplitz structure. Numerical matrix–matrix multiplications with O(F3) complexity are eliminated by simple data loading process, where is the spreading factor. Finally, an iterative Conjugate-Gradient based algorithm is proposed to avoid the inverse of the Hermitian symmetric innovation covariance matrix in Kalman gain processor. The proposed architecture not only reduces the numerical complexity from O(F2) to O(Flog2F) per chip, but also facilitates the parallel and pipelined VLSI implementation in real-time processing.Item Untimed-C based SoC Architecture Design Space Exploration for 3G and Beyond Wireless Systems(2005-02-01) Guo, Yuanbin; McCain, Dennis; Center for Multimedia Communications (http://cmc.rice.edu/)In this paper, we propose an un-timed C/C++ level verification methodology that integrates key technologies for truly high-level VLSI modelling to keep pace with the explosive complexity of SoC designs in the 3G and beyond wireless communications. A Catapult C/C++ based architecture scheduler transfers the major workload to the algorithmic C/C++ fixed-point design. Case study is given to explore the VLSI design space extensively for various types of computational intensive algorithms in MIMO-CDMA systems, such as a MIMO equalizer to avoid the Direct-Matrix-Inverse. Extensive time/area tradeoff study is enabled with different architecture and resource constraints in a short design cycle. Architecture efficiency and productivity are improved significantly, enabling truly rapid prototyping for the 3G and beyond wireless systems.