Browsing by Author "Li, Kaipeng"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Decentralized Baseband Processing for Massive MU-MIMO Systems(IEEE, 2017) Li, Kaipeng; Sharan, Rishi; Chen, Yujun; Goldstein, Tom; Cavallaro, Joseph R.; Studer, ChristophAchieving high spectral efficiency in realistic massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems requires computationally-complex algorithms for data detection in the uplink (users transmit to base-station) and beamforming in the downlink (base-station transmits to users). Most existing algorithms are designed to be executed on centralized computing hardware at the base-station (BS), which results in prohibitive complexity for systems with hundreds or thousands of antennas and generates raw baseband data rates that exceed the limits of current interconnect technology and chip I/O interfaces. This paper proposes a novel decentralized baseband processing architecture that alleviates these bottlenecks by partitioning the BS antenna array into clusters, each associated with independent radio-frequency chains, analog and digital modulation circuitry, and computing hardware. For this architecture, we develop novel decentralized data detection and beamforming algorithms that only access local channel-state information and require low communication bandwidth among the clusters. We study the associated trade-offs between error-rate performance, computational complexity, and interconnect bandwidth, and we demonstrate the scalability of our solutions for massive MU-MIMO systems with thousands of BS antennas using reference implementations on a graphic processing unit (GPU) cluster.Item Decentralized Baseband Processing for Massive MU-MIMO Systems(2019-04-19) Li, Kaipeng; Cavallaro, JosephAchieving high spectral efficiency in realistic massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems requires computationally-complex algorithms for data detection in the uplink (users transmit to base-station) and beamforming in the downlink (base-station transmits to users). Most existing algorithms are designed to be executed on centralized computing hardware at the base-station (BS), which results in prohibitive complexity for systems with hundreds or thousands of antennas and generates raw baseband data rates that exceed the limits of current interconnect technology and chip I/O interfaces. This thesis proposes novel decentralized baseband processing architectures that alleviate these bottlenecks by partitioning the BS antenna array into clusters, each associated with independent radio-frequency chains, analog and digital modulation circuitry, and computing hardware. For those decentralized architectures, we develop novel decentralized data detection and beamforming algorithms that only access local channel-state information and require low communication bandwidth among the clusters. We first propose a decentralized consensus-sharing architecture. With this architecture, each cluster performs local baseband processing in parallel and then shares their local results with little amount of data transfer to compute a global consensus at a centralized processing element; the consensus is then broadcast to each cluster for another round of local update. After a few rounds of local update and consensus sharing, a converged global consensus result is obtained. Under this architecture, we solve uplink data detection and downlink beamforming problems using alternating direction method of multipliers (ADMM) and conjugate gradient methods in a decentralized manner, and show superb error-rate performance that has minimum loss compared to centralized solutions. To reduce the data transfer latency across clusters, we further propose a decentralized feedforward architecture that only requires one-shot message passing among clusters to arrive at global detection or beamforming results. With this architecture, we develop multiple variations of detection and beamforming algorithms with non-linear or linear local solvers, and with partially or fully decentralization schemes, that realize trade-offs between error-rate performance, computational complexity, and interconnect bandwidth. To evaluate the hardware efficiency of our proposed methods, we implement above decentralized detection and beamforming algorithms on multi-GPU systems using parallel and distributed programming techniques to optimize the data rate performance. Our implementations achieve less than 1ms latency and over 1Gbps data throughput on a high-end multi-GPU platform, and demonstrate high scalability to support hundreds to thousands of antennas for massive MU-MIMO systems.Item GPU Accelerated Reconfigurable Detector and Precoder for Massive MIMO SDR Systems(2015-12-02) Li, Kaipeng; Cavallaro, Joseph; Aazhang, Behnaam; Zhong, LinWe present a reconfigurable GPU-based unified detector and precoder for massive MIMO software-defined radio systems. To enable high throughput, we implement the linear minimum mean square error detector/precoder and further reduce the algorithm complexity by numerical approximation without sacrificing the error-rate performance. For efficient GPU implementation, we explore the algorithm's inherent parallelism and take advantage of the GPU's numerous computing cores and hierarchical memories for the optimization of kernel computations. We furthermore perform multi-stream scheduling and multi-GPU workload deployment to pipeline multiple detection or precoding tasks on GPU streams for the reduction of host-device memory copy overhead. The flexible design supports both detection and precoding and can switch between Cholesky based mode and conjugate gradient based mode for accuracy and complexity tradeoff. The GPU implementation exceeds 250 Mb/s detection and precoding throughput for a 128x16 antenna system.Item Implicit vs. Explicit Approximate Matrix Inversion for Wideband Massive MU-MIMO Data Detection(Springer, 2017) Wu, Michael; Yin, Bei; Li, Kaipeng; Dick, Chris; Cavallaro, Joseph R.; Studer, ChristophMassive multi-user (MU) MIMO wireless technology promises improved spectral efficiency compared to that of traditional cellular systems. While data-detection algorithms that rely on linear equalization achieve near-optimal error-rate performance for massive MU-MIMO systems, they require the solution to large linear systems at high throughput and low latency, which results in excessively high receiver complexity. In this paper, we investigate a variety of exact and approximate equalization schemes that solve the system of linear equations either explicitly (requiring the computation of a matrix inverse) or implicitly (by directly computing the solution vector). We analyze the associated performance/complexity trade-offs, and we show that for small base-station (BS)-to-user-antenna ratios, exact and implicit data detection using the Cholesky decomposition achieves near-optimal performance at low complexity. In contrast, implicit data detection using approximate equalization methods results in the best trade-off for large BS-to-user-antenna ratios. By combining the advantages of exact, approximate, implicit, and explicit matrix inversion, we develop a new frequency-adaptive e qualizer (FADE), which outperforms existing data-detection methods in terms of performance and complexity for wideband massive MU-MIMO systems.Item Low-Complexity Subband Digital Predistortion for Spurious Emission Suppression in Noncontiguous Spectrum Access(IEEE, 2016) Abdelaziz, Mahmoud; Anttila, Lauri; Tarver, Chance; Li, Kaipeng; Cavallaro, Joseph R.; Valkama, MikkoNoncontiguous transmission schemes combined with high power-efficiency requirements pose big challenges for radio transmitter and power amplifier (PA) design and implementation. Due to the nonlinear nature of the PA, severe unwanted emissions can occur, which can potentially interfere with neighboring channel signals or even desensitize the own receiver in frequency division duplexing transceivers. In this paper, to suppress such unwanted emissions, a low-complexity subband digital predistortion solution, specifically tailored for spectrally noncontiguous transmission schemes in low-cost devices, is proposed. The proposed technique aims at mitigating only the selected spurious intermodulation distortion components at the PA output, hence allowing for substantially reduced processing complexity compared with classical linearization solutions. Furthermore, novel decorrelation-based parameter learning solutions are also proposed and formulated, which offer reduced computing complexity in parameter estimation as well as the ability to track time-varying features adaptively. Comprehensive simulation and RF measurement results are provided, using a commercial LTE-Advanced mobile PA, to evaluate and validate the effectiveness of the proposed solution in real-world scenarios. The obtained results demonstrate that highly efficient spurious component suppression can be obtained using the proposed solutions.Item On the achievable rates of decentralized equalization in massive MU-MIMO systems(IEEE, 2017) Jeon, Charles; Li, Kaipeng; Cavallaro, Joseph R.; Studer, ChristophMassive multi-user (MU) multiple-input multiple-output (MIMO) promises significant gains in spectral efficiency compared to traditional, small-scale MIMO technology. Linear equalization algorithms, such as zero forcing (ZF) or minimum mean-square error (MMSE)-based methods, typically rely on centralized processing at the base station (BS), which results in (i) excessively high interconnect and chip input/output data rates, and (ii) high computational complexity. In this paper, we investigate the achievable rates of decentralized equalization that mitigates both of these issues. We consider two distinct BS architectures that partition the antenna array into clusters, each associated with independent radio-frequency chains and signal processing hardware, and the results of each cluster are fused in a feed forward network. For both architectures, we consider ZF, MMSE, and a novel, non-linear equalization algorithm that builds upon approximate message passing (AMP), and we theoretically analyze the achievable rates of these methods. Our results demonstrate that decentralized equalization with our AMP-based methods incurs no or only a negligible loss in terms of achievable rates compared to that of centralized solutions.Item Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters(Springer, 2017) Li, Kaipeng; Ghazi, Amanullah; Tarver, Chance; Juntti, Markku; Boutellier, Jani; Abdelaziz, Mahmoud; Anttila, Lauri; Juntti, Markku; Valkama, Mikko; Cavallaro, Joseph R.Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitters, considering their limited computing power compared to basestations. In this paper, we present high data rate implementations of broadband DPD on modern embedded processors, such as mobile GPU and multicore CPU, by taking advantage of emerging parallel computing techniques for exploiting their computing resources. We further verify the suppression effect of DPD experimentally on real radio hardware platforms. Performance evaluation results of our DPD design demonstrate the high efficacy of modern general purpose mobile processors on accelerating DPD processing for a mobile transmitter.