Browsing by Author "Yin, Bei"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
Item Flexible N-Way MIMO Detector on GPU(IEEE Computer Society, 2012-10-17) Wu, Michael; Yin, Bei; Cavallaro, Joseph R.; CMCThis paper proposes a flexible Multiple-Input Multiple-Output (MIMO) detector on graphics processing units (GPU). MIMO detection is a key technology in broadband wireless system such as LTE,WiMAX, and 802.11n. Existing detectors either use costly sorting for better performance or sacrifice sorting for higher throughput. To achieve good performance with high thoughput, our detector runs multiple search passes in parallel, where each search pass detects the transmit stream with a different permuted detection order. We show that this flexible detector, including QR decomposition preprocessing, outperforms existing GPU MIMO detectors while maintaining good bit error rate (BER) performance. In addition, this detector can achieve different tradeoffs between throughput and accuracy by changing the number of parallel search passes.Item FPGA Prototyping of A High Data Rate LTE Uplink Baseband Receiver(IEEE, 2009-11-01) Wang, Guohui; Yin, Bei; Amiri, Kiarash; Sun, Yang; Wu, Michael; Cavallaro, Joseph R.; Center for Multimedia CommunicationThe Third Generation Partnership Project (3GPP) Long Term Evolution (LTE) standard is becoming the appropriate choice to pave the way for the next generation wireless and cellular standards. While the popular OFDM technique has been adopted and implemented in previous standards and also in the LTE downlink, it suffers from high peak-to-average-power ratio (PAPR). High PAPR requires more sophisticated power amplifiers (PAs) in the handsets and would result in lower efficiency PAs. In order to combat such effects, the LTE uplink choice of transmission is the novel Single Carrier Frequency Division Multiple Access (SC-FDMA) scheme which has lower PAPR due to its inherent signal structure. While reducing the PAPR, the SC-FDMA requires a more complicated detector structure in the base station for multi-antenna and multi-user scenarios. Since the multi-antenna and multi-user scenarios are critical parts of the LTE standard to deliver high performance and data rate, it is important to design novel architectures to ensure high reliability and data rate in the receiver. In this paper, we propose a flexible architecture of a high data rate LTE uplink receiver with multiple receive antennas and implemented a single FPGA prototype of this architecture. The architecture is verified on the WARPLab (a software defined radio platform based on Rice Wireless Open-access Research Platform) and tested in the real over-the-air indoor channel.Item High-Level Design Tools for Complex DSP Applications(Elsevier, Waltham, MA, 2012-07-12) Sun, Yang; Amiri, Kiarash; Wang, Guohui; Yin, Bei; Cavallaro, Joseph R.; Ly, Tai; Center for Multimedia CommunicationHigh-level synthesis design methodology - High level synthesis (HLS) [1], also known as behavioral synthesis and algorithmic synthesis, is a design process in which a high level, functional description of a design is automatically compiled into a RTL implementation that meets certain user specified design constraints. The HLS design description is ‘high level’ compared to RTL in two aspects: design abstraction, and specification language.Item Implementation Trade-Offs For Linear Detection In Large-Scale MIMO Systems(IEEE, 2013-06) Yin, Bei; Wu, Michael; Studer, Christoph; Cavallaro, Joseph R.; Dick, ChrisIn this paper, we analyze the VLSI implementation tradeoffs for linear data detection in the uplink of large-scale multiple-input multiple-output (MIMO) wireless systems. Specifically, we analyze the error incurred by using the suboptimal, low-complexity matrix inverse proposed in Wu et al., 2013, ISCAS, and compare its performance and complexity to an exact matrix inversion algorithm. We propose a Cholesky-based reference architecture for exact matrix inversion and show corresponding implementation results on an Virtex-7 FPGA. Using this reference design, we perform a performance/complexity trade-off comparison with an FPGA implementation for the proposed approximate matrix inversion, which reveals that the inversion circuit of choice is determined by the antenna configuration (base-station antennas vs. number of users) of large-scale MIMO systems.Item Implicit vs. Explicit Approximate Matrix Inversion for Wideband Massive MU-MIMO Data Detection(Springer, 2017) Wu, Michael; Yin, Bei; Li, Kaipeng; Dick, Chris; Cavallaro, Joseph R.; Studer, ChristophMassive multi-user (MU) MIMO wireless technology promises improved spectral efficiency compared to that of traditional cellular systems. While data-detection algorithms that rely on linear equalization achieve near-optimal error-rate performance for massive MU-MIMO systems, they require the solution to large linear systems at high throughput and low latency, which results in excessively high receiver complexity. In this paper, we investigate a variety of exact and approximate equalization schemes that solve the system of linear equations either explicitly (requiring the computation of a matrix inverse) or implicitly (by directly computing the solution vector). We analyze the associated performance/complexity trade-offs, and we show that for small base-station (BS)-to-user-antenna ratios, exact and implicit data detection using the Cholesky decomposition achieves near-optimal performance at low complexity. In contrast, implicit data detection using approximate equalization methods results in the best trade-off for large BS-to-user-antenna ratios. By combining the advantages of exact, approximate, implicit, and explicit matrix inversion, we develop a new frequency-adaptive e qualizer (FADE), which outperforms existing data-detection methods in terms of performance and complexity for wideband massive MU-MIMO systems.Item Low Complexity Detection and Precoding for Massive MIMO Systems: Algorithm, Architecture, and Application(2014-12-03) Yin, Bei; Cavallaro, Joseph R.; Aazhang, Behnaam; Hicks, Illya V.; Studer, ChristophMassive (or large-scale) MIMO is an emerging technology to improve the spectral efficiency of existing (small-scale) MIMO wireless communication systems. The main idea is to equip the base station (BS) with hundreds of antennas that serves a small number of users (in the orders of tens) simultaneously in the same frequency band. In such a system, the data detection and precoding are among the most challenging tasks in terms of computational complexity and performance. Although theoretical results show that simple detection and precoding algorithms are able to achieve optimal error rate performance when the number of BS antennas approaches infinity, the systems with realistic antenna configurations have to resort to computationally expensive algorithms to achieve near-optimal performance. In this research, we show that by utilizing the special property of massive MIMO systems, approximate linear detection and precoding can deliver near-optimal error rate performance with low complexity. We first propose approximate methods relying on Neumann series. This approach requires lower computational complexity than that of an exact inversion while delivering near-optimal results when there is a large ratio between BS and user antennas. We then develop a novel reconfigurable VLSI architecture to perform both the necessary Gram matrix computation and Neumann series based matrix inversion. The Neumann series approach, however, suffers from a considerable error-rate performance loss if the ratio of BS to user antennas is not large enough. To improve the performance, we investigate the conjugate gradient (CG) method (without explicitly computing matrix inversion) and conjugate gradient least square (CGLS) method (without explicitly computing Gram matrix and matrix inversion). Although CG and CGLS for precoding are rather straightforward, the necessary signal-to-interference-and-noise-ratio (SINR) for soft-output detection is not computed by CG and CGLS. To solve this problem, we propose an exact and an approximate method to compute the SINR within CG and CGLS algorithm with low complexity. We show that compared to exact and Neumann series based linear methods, CG based detection and precoding method is suitable for systems with small to medium number of users, while CGLS is suitable for systems with large number of users. A novel reconfigurable VLSI architecture is then proposed to support the both CG and CGLS.Item Low complexity MMSE based interference cancellation for LTE uplink MIMO receiver(Wireless Innovation Forum, 2011-12-01) Yin, Bei; Cavallaro, Joseph R.; Center for Multimedia CommunicationIn this paper, we propose a novel low complexity minimum mean square error (MMSE) interference cancellation (IC) to minimize the residual inter-symbol and inter-antenna interference in LTE/LTE-Advanced uplink. In the LTE/LTE-Advanced base station, frequency domain equalizers (FDEs) are adopted to achieve good performance. However, in multi-tap channels, the residual interference of FDE still degrades the performance. Conventional IC schemes can minimize this interference, but have high complexity and large feedback latency. These result in low throughput and require a large amount of resource in software defined radio (SDR) implementation. We show that our scheme can bring up to 8 dB gains in different channels, but only adds up to 7.2% complexity to the receiver. Compared to conventional IC, our scheme has fewer multiplications, less data to store, and shorter feedback latency.Item Low Complexity Opportunistic Decoder for Network Coding(IEEE, 2012-12-01) Yin, Bei; Wu, Michael; Wang, Guohui; Cavallaro, Joseph R.; CMCIn this paper, we propose a novel opportunistic decoding scheme for network coding decoder which significantly reduces the decoder complexity and increases the throughput. Network coding was proposed to improve the network throughput and reliability, especially for multicast transmissions. Although network coding increases the network performance, the complexity of the network coding decoder algorithm is still high, especially for higher dimensional finite fields or larger network codes. Different software and hardware approaches were proposed to accelerate the decoding algorithm, but the decoder remains to be the bottleneck for high speed data transmission. We propose a novel decoding scheme which exploits the structure of the network coding matrix to reduce the network decoder complexity and improve throughput. We also implemented the proposed scheme on Virtex 7 FPGA and compared our implementation to the widely used Gaussian elimination.Item LTE uplink MIMO receiver with low complexity interference cancellation(Springer, 2012-11-01) Yin, Bei; Cavallaro, Joseph R.; CMCIn LTE/LTE-A uplink receiver, frequency domain equalizers (FDE) are adopted to achieve good performance. However, in multi-tap channels, the residual inter-symbol and inter-antenna interference still exist after FDE and degrade the performance. Conventional interference cancellation schemes can minimize this interference by using frequency domain interference cancellation. However, those schemes have high complexity and large feedback latency, especially when adopting a large number of iterations. These result in low throughput and require a large amount of resource in software defined radio implementation. In this paper, we propose a novel low complexity interference cancellation scheme to minimize the residual interference in LTE/LTE-A uplink. Our proposed scheme can bring about 2 dB gains in different channels, but only adds up to 7.2 % complexity to the receiver. The scheme is further implemented on Xilinx FPGA. Compared to other conventional interference cancellation schemes, our scheme has less complexity, less data to store, and shorter feedback latency.Item Parallel Nonbinary LDPC Decoding on GPU(IEEE, 2012-12-01) Wang, Guohui; Shen, Hao; Yin, Bei; Wu, Michael; Sun, Yang; Cavallaro, Joseph R.Nonbinary Low-Density Parity-Check (LDPC) codes are a class of error-correcting codes constructed over the Galois field GF(q) for q > 2. As extensions of binary LDPC codes, nonbinary LDPC codes can provide better error-correcting performance when the code length is short or moderate, but at a cost of higher decoding complexity. This paper proposes a massively parallel implementation of a nonbinary LDPC decoding accelerator based on a graphics processing unit (GPU) to achieve both great flexibility and scalability. The implementation maps the Min-Max decoding algorithm to GPU’s massively parallel architecture. We highlight the methodology to partition the decoding task to a heterogeneous platform consisting of the CPU and GPU. The experimental results show that our GPUbased implementation can achieve high throughput while still providing great flexibility and scalability.Item Reconfigurable Multi-Standard Uplink MIMO Receiver with Partial Interference Cancellation(IEEE, 2012-06-01) Yin, Bei; Amiri, Kiarash; Cavallaro, Joseph R.; Guo, Yuanbin; Center for Multimedia CommunicationAs HSPA/HSPA+ and LTE/LTE-A evolve in parallel, the reconfigurability of a receiver to support multiple standards has become more and more important, especially for small cells. In this paper, we first suggest a reconfigurable multistandard uplink MIMO receiver based on a frequency domain equalizer. Then, to improve the performance, we propose two low-complexity partial iterative interference cancellation (IC) schemes to deal with the residual inter-chip and inter-antenna interference in HSPA/HSPA+ and the residual inter-symbol and inter-antenna interference in LTE/LTE-A. Compared with a receiver consisting of separate HSPA/HSPA+ and LTE/LTE-A uplink receivers, this reconfigurable receiver can save up to 66.9% complexity. Moreover, the two partial IC schemes have negligible performance loss compared with full IC scheme. They can achieve 2 dB gains in both standards with only 15.2% additional complexity to no IC scheme.