Browsing by Author "Juntti, Markku"
Now showing 1 - 17 of 17
Results Per Page
Sort Options
Item ARCHITECTURE DESIGN AND IMPLEMENTATION OF THE INCREASING RADIUS - LIST SPHERE DETECTOR ALGORITHM(IEEE, 2009-04-01) Myllylä, Markus; Juntti, Markku; Cavallaro, Joseph R.; Center for Multimedia CommunicationA list sphere detector (LSD) is an enhancement of a sphere detector (SD) that can be used to approximate the optimal MAP detector. In this paper, we introduce a novel architecture for the increasing radius (IR)-LSD algorithm, which is based on the Dijkstra’s algorithm. The parallelism possibilities are introduced in the presented architecture, which is also scalable for different multiple-input multiple-output (MIMO) systems. The novel architecture is implemented on a Virtex-IV field programmable gate array (FPGA) chip using high-level ANSI C++ language based Catapult C Synthesis tool from Mentor Graphics. The used word lengths, the latency of the design, and the required resources are presented and analyzed for 4 x 4 MIMO system with 16- quadrature amplitude modulation (QAM). The detector implementation achieves a maximum throughput of 12.1Mbps at high signal-to-noise ratio (SNR).Item Architecture Design and Implementation of the Metric First List Sphere Detector Algorithm(IEEE, 2011-05-01) Myllylä, Markus; Cavallaro, Joseph R.; Juntti, Markku; Center for Multimedia CommunicationSoft-output detection of a multiple-input–multiple-output (MIMO) signal pose a significant challenge in future wireless systems. In this paper, we introduce a soft-output modified metric first (MMF)-LSD algorithm for MIMO detection. We design a scalable architecture and address a method to decrease memory requirements. We provide implementation results for a spatial multiplexing (SM) system with four transmitted streams and with 16- and 64-quadrature amplitude modulation (QAM) on a 0.18- m CMOS application specific integrated circuit (ASIC) technology. The MFF-LSD implementation is more efficient than the depth first (DF) -LSD in the crucial low signal-to-noise rate (SNR)region and the detection rate of the 64-QAM implementation is 39.2 Mbps@26 db with 48.2 kGEs complexity.Item ASIC Implementation Comparison of SIC and LSD Receivers for MIMO-OFDM(IEEE, 2008-10-01) Ketonen, Johanna; Myllylä, Markus; Juntti, Markku; Cavallaro, Joseph R.; Center for Multimedia CommunicationMIMO-OFDM receivers with horizontal encoding are considered in this paper. The successive interference cancellation (SIC) algorithm is compared to the K-best list sphere detector (LSD). A modification to the K-best LSD algorithm is introduced. The SIC and K-best LSD receivers are designed for a 2 x 2 antenna system with 64-quadrature amplitude modulation (QAM). The ASIC implementation results for both architectures are presented. The K-best LSD outperforms the SIC receiver in bad channel conditions but the SIC receiver performs better in channels with less correlated MIMO streams. The latency of the K-best LSD is large due to the high modulation order and list size. The throughput of the SIC receiver is more than 6 times higher than that of the K-best LSD.Item COMPARISON OF TWO NOVEL LIST SPHERE DETECTOR ALGORITHMS FOR MIMO-OFDM SYSTEMS(IEEE, 2006-09-01) Myllylä, Markus; Silvola, Pirkka; Juntti, Markku; Cavallaro, Joseph R.; Center for Multimedia CommunicationIn this paper, the complexity and performance of two novel list sphere detector (LSD) algorithms are studied and evaluated in multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) system. The LSDs are based on the K-best and the Schnorr-Euchner enumeration (SEE) algorithms. The required list sizes for LSD algorithms are determined for a 2×2 system with 4- quadrature amplitude modulation (QAM), 16-QAM, and 64-QAM. The complexity of the algorithms is compared by studying the number of visited nodes per received symbol vector by the algorithm in computer simulations. The SEE based LSD algorithm is found to be a less complex and a feasible choice for implementation compared to the K-best based LSD algorithm.Item Complexity Analysis of MMSE Detector Architectures for MIMO OFDM Systems(IEEE, 2005-11-01) Myllylä, Markus; Hintikka, Juha-Matti; Cavallaro, Joseph R.; Juntti, Markku; Limingoja, Matti; Byman, Aaron; Center for Multimedia CommunicationIn this paper, a field programmable gate array (FPGA) implementation of a linear minimum mean square error (LMMSE) detector is considered for MIMO-OFDM systems. Two square root free algorithms based on QR decomposition (QRD) are introduced for the implementation of LMMSE detector. Both algorithms are based on QRD via Givens rotations, namely coordinate rotation digital computer (CORDIC) and squared Givens rotation (SGR) algorithms. Linear and triangular shaped array architectures are considered to exploit the parallelism in the computations. An FPGA hardware implementation is presented and computational complexity of each implementation is evaluated and compared.Item Dataflow Modeling and Design for Cognitive Radio Networks(8th International Conference on Cognitive Radio Oriented Wireless Networks, 2013-10-01) Wang, Lai-Huei; Bhattacharyya, Shuvra S.; Vosoughi, Aida; Cavallaro, Joseph R.; Juntti, Markku; Boutellier, Jani; Silven, Olli; Valkama, Mikko; CMCCognitive radio networks present challenges at many levels of design including configuration, control, and crosslayer optimization. In this paper, we focus primarily on dataflow representations to enable flexibility and reconfigurability in many of the baseband algorithms. Dataflow modeling will be important to provide a layer of abstraction and will be applied to generate flexible baseband representations for cognitive radio testbeds, including the Rice WARP platform. As RF frequency agility and reconfiguration for carrier aggregation are important goals for 4G LTE Advanced systems, we also focus on dataflow analysis for digital pre-distortion algorithms. A new design method called parameterized multidimensional design hierarchy mapping(PMDHM) is presented, along with initial speedup results from applying PMDHM in the mapping of channel estimation onto a GPU architecture.Item Decision-Directed Channel Estimation Implementation for Spectral Efficiency Improvement in Mobile MIMO-OFDM(Springer, 2015) Ketonen, Johanna; Juntti, Markku; Ylioinas, Jari; Cavallaro, Joseph R.Channel estimation algorithms and their implementations for mobile receivers are considered in this paper. The 3GPP long term evolution (LTE) based pilot structure is used as a benchmark in a multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) receiver. The decision directed (DD) space alternating generalized expectation-maximization (SAGE) algorithm is used to improve the performance from that of the pilot symbol based least-squares (LS) channel estimator. The performance is improved with high user velocities, where the pilot symbol density is not sufficient. Minimum mean square error (MMSE) filtering is also used in estimating the channel in between pilot symbols. The pilot overhead can be reduced to a third of the LTE pilot overhead with DD channel estimation, obtaining a ten percent increase in data throughput. Complexity reduction and latency issues are considered in the architecture design. The pilot based LS, MMSE and the SAGE channel estimators are implemented with a high level synthesis tool, synthesized with the UMC 0.18 μm CMOS technology and the performance-complexity trade-offs are studied. The MMSE estimator improves the performance from the simple LS estimator with LTE pilot structure and has low power consumption. The SAGE estimator has high power consumption but can be used with reduced pilot density to increase the data rate.Item Design Space Exploration of Parallel Algorithms and Architectures for Wireless Communication and Mobile Computing Systems(2014-10-30) Wang, Guohui; Cavallaro, Joseph R.; Sarkar, Vivek; Zhong, Lin; Juntti, MarkkuDuring past several years, there has been a trend that the modern mobile SoC (system-on-chip) chipsets start to incorporate in one single chip the functionality of several general purpose processors and application-specific accelerators to reduce the cost, the power consumption and the communication overhead. Given the ever-growing performance requirements and strict power constraints, the existence of different types of signal processing workloads have posed challenges to the mapping of the computationally-intensive algorithms to the heterogeneous architecture of the mobile SoCs. Many such signal processing workloads such as channel decoding for wireless communication modem and mobile computer vision applications have high computational complexity, which requires accelerators implemented with parallel algorithms and architectures to meet the performance requirements. Partitioning the workloads and deploying them with the appropriate components of mobile chipsets are crucial to fully utilize the mobile SoC's heterogeneous architecture. The goal of this thesis is to study parallel algorithms and architecture of high performance signal processing accelerators for several representative application workloads in wireless communication and mobile computing systems. We explore the design space of the parallel algorithms and architectures and highlight the workload partitioning and architecture-aware optimization schemes including algorithmic optimization, data structure optimization, and memory access optimization to improve the throughput performance and hardware (or energy) efficiency. As case studies, we will first propose contention-free interleaver architecture for parallel turbo decoding, which enables high throughput multi-standard turbo decoding ASIC (application-specific integrated circuit) with efficient hardware. Secondly, we propose massively parallel LDPC (low-density parity-check) decoding algorithm and implementation using GPU (graphics processor unit), which leads to high throughput and low latency LDPC decoding for practical SDR (software-defined radio) systems. Furthermore, we take advantage of the heterogeneous mobile CPU and GPU to accelerate representative mobile computer vision algorithms such as image editing and local feature extraction algorithms. Based on algorithm analysis and experimental results from the above case studies, we finally explore the design space and compare the performance of accelerator architectures for wireless communication and mobile vision use cases. We will show that the heterogeneous architecture of mobile systems is the key to efficiently accelerating parallel algorithms in order to meet the growing requirements of performance, efficiency, and flexibility.Item The effect of LLR clipping to the complexity of list sphere detector algorithms(IEEE, 2007-11-01) Myllylä, Markus; Antikainen, Juho; Juntti, Markku; Cavallaro, Joseph R.; Center for Multimedia CommunicationThe optimal detection for coded system requires the use of a maximum a posteriori (MAP) detection. A list sphere detector (LSD) can be used to approximate the MAP detector. Depending on the used list size, LSD provides a tradeoff between the performance and the computational complexity. The LSD output candidate list is used to calculate the approximation of the probability log-likelihood ratio (LLR) of each transmitted bit. The list should be large enough and it should include at least one candidate for both possible bits for good approximation. The use of a small list size causes inaccurate and, especially, very large LLRs that prevent the decoder from correcting the falsely detected signals and, thus, degrades performance. We study the effect of the LLR clipping to the performance and complexity of the LSD algorithm. We show that by limiting the dynamic range of the LLR the required LSD list size can be decreased, and, thus, the complexity of the algorithms is decreased. The optimal dynamic range values for LLR clipping are determined and the effect of the clipping to the complexity of the LSD algorithms is analyzed.Item The Effect of Preprocessing to the Complexity of List Sphere Detector Algorithms(WPMC, 2008-09-01) Myllylä, Markus; Juntti, Markku; Cavallaro, Joseph R.; Center for Multimedia CommunicationA list sphere detector (LSD) is an enhancement of a sphere detector (SD) that can be used to approximate the soft output MAP detector used in the detection of the multiple-input multiple-output (MIMO) signals. The LSD algorithm executes a tree search on the given lattice and returns a candidate list. The LSD algorithm complexity, i.e., the number of visited nodes in the search tree, can be decreased by applying proper ordering of the transmitted spatial streams in the detection. In this paper, we study the effect of two sophisticated preprocessing methods, the channel matrix column ordering based on Euclidean norm and the sorted QR decomposition (SQRD), to the performance and complexity of the LSD algorithms and compare them to the traditional QR decomposition (QRD). We show that the SQRD preprocessing is a simple way to decrease complexity of the LSD and it decreases the number of visited nodes approximately 20 - 30% compared to the QRD which results in significant number of saved arithmetic operations in the LSD. We also show that the plain channel matrix column ordering is not feasible preprocessing method to be used with LSD in highly correlated channel realization.Item Implementation and Complexity Analysis of List Sphere Detector for MIMO-OFDM systems(IEEE, 2008-10-01) Myllylä, Markus; Juntti, Markku; Cavallaro, Joseph R.; Center for Multimedia CommunicationA list sphere detector (LSD) is an enhancement of a sphere detector (SD) that can be used to approximate the soft output maximum a posteriori probability (MAP) detector used in the detection of the multiple-input multiple-output (MIMO) signals. The LSD consists of three different parts: the preprocessing unit, the LSD algorithm unit and the log-likelihood ratio (LLR) calculation unit. Architecture design is the key point to enable an efficient implementation of the LSD. In this paper, we design the architecture for the whole detector structure and exploit the parallelism and pipelining possibilities of the presented architecture units. The designed architecture is implemented in a field programmable gate array (FPGA) using Mentor Graphics Catapult C tool. We show that a scalable architecture can be designed for the LSD. The LSD is also shown to be feasible for practical implementation, and the implementation complexity and latency results are presented.Item Implementation aspects of list sphere decoder algorithms for MIMO-OFDM systems(Elsevier, 2010-10-01) Myllylä, Markus; Cavallaro, Joseph R.; Juntti, Markku; Center for Multimedia CommunicationA list sphere decoder (LSD) can be used to approximate the optimal maximum a posteriori (MAP) detector for the detection of multiple-input multiple-output (MIMO) signals. In this paper, we consider two LSD algorithms with different search methods and study some algorithm design choices which relate to the performance and computational complexity of the algorithm. We show that by limiting the dynamic range of log-likelihood ratio, the required LSD list size can be lowered, and, thus, the complexity of the LSD algorithm is decreased. We compare the real and the complex-valued signal models and their impact on the complexity of the algorithms. We show that the real-valued signal model is clearly the less complex choice and a better alternative for implementation. We also show the complexity of the sequential search LSD algorithm can be reduced by limiting the maximum number of checked nodes without sacrificing the performance of the system. Finally, we study the complexity and performance of an iterative receiver, analyze the tradeoff choices between complexity and performance, and show that the additional computational cost in LSD is justified to get better soft-output approximation.Item Implementation of LS, MMSE and SAGE Channel Estimators for Mobile MIMO-OFDM(IEEE, 2012-12-01) Ketonen, Johanna; Juntti, Markku; Ylioinas, Jari; Cavallaro, Joseph R.; CMCThe use of decision directed (DD) channel estimation in a multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) downlink receiver is studied in this paper. The 3GPP long term evolution (LTE) based pilot structure is used as a benchmark. The space-alternating generalized expectation-maximization (SAGE) algorithm is used to improve the performance from that of the pilot symbol based least-squares (LS) channel estimator. The DD channel estimation improves the performance with high user velocities, where the pilot symbol density is not sufficient. Minimum mean square error (MMSE) filtering can also be used in estimating the channel in between pilot symbols. The DD channel estimation can be used to reduce the pilot overhead without any performance degradation by transmitting data instead of pilot symbols. The pilot overhead is reduced to a third of the LTE pilot overhead, obtaining a ten percent increase in throughput. The pilot based LS, MMSE and the SAGE channel estimators are implemented and the performance-complexity trade-offs are studied.Item Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters(Springer, 2017) Li, Kaipeng; Ghazi, Amanullah; Tarver, Chance; Juntti, Markku; Boutellier, Jani; Abdelaziz, Mahmoud; Anttila, Lauri; Juntti, Markku; Valkama, Mikko; Cavallaro, Joseph R.Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitters, considering their limited computing power compared to basestations. In this paper, we present high data rate implementations of broadband DPD on modern embedded processors, such as mobile GPU and multicore CPU, by taking advantage of emerging parallel computing techniques for exploiting their computing resources. We further verify the suppression effect of DPD experimentally on real radio hardware platforms. Performance evaluation results of our DPD design demonstrate the high efficacy of modern general purpose mobile processors on accelerating DPD processing for a mobile transmitter.Item Performance - Complexity Comparison of Receivers for a LTE MIMO–OFDM System(IEEE, 2010-06-01) Ketonen, Johanna; Juntti, Markku; Cavallaro, Joseph R.; Center for Multimedia CommunicationImplementation of receivers for spatial multiplexing multiple-input multiple-output (MIMO) orthogonal-frequency-division-multiplexing (OFDM) systems is considered. The linear minimum mean-square error (LMMSE) and the K-best list sphere detector (LSD) are compared to the iterative successive interference cancellation (SIC) detector and the iterative K-best LSD. The performance of the algorithms is evaluated in 3G long-term evolution (LTE) system. The SIC algorithm is found to perform worse than the K-best LSD when the MIMO channels are highly correlated, while the performance difference diminishes when the correlation decreases. The receivers are designed for 2X2 and 4X4 antenna systems and three different modulation schemes. Complexity results for FPGA and ASIC implementations are found. A modification to the K-best LSD which increases its detection rate is introduced. The ASIC receivers are designed to meet the decoding throughput requirements in LTE and the K-best LSD is found to be the most complex receiver although it gives the best reliable data transmission throughput. The SIC receiver has the best performance–complexity tradeoff in the 2X2 system but in the 4X4 case, the K-best LSD is the most efficient. A receiver architecture which could be reconfigured to using a simple or a more complex detector as the channel conditions change would achieve the best performance while consuming the least amount of power in the receiver.Item Performance Evaluation of Two LMMSE Detectors in a MIMO-OFDM Hardware Testbed(IEEE, 2006-11-01) Myllylä, Markus; Juntti, Markku; Limingoja, Matti; Byman, Aaron; Cavallaro, Joseph R.; Center for Multimedia CommunicationThe performance of two field programmable gate array (FPGA) implementations of a linear minimum mean square error (LMMSE) based detector is evaluated in real-time radio channels. Two square root free algorithms based on the QR decomposition (QRD) via Givens rotations, namely coordinate rotation digital computer (CORDIC) and squared Givens rotation (SGR) algorithms, are applied for the LMMSE detector implementation with pipelined systolic array architectures. The implementations are mapped to Elektrobit 2 x 2 multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) hardware testbed for 4G MIMO systems (EB4G). The presented measurement results are done with a Propsim C8 MIMO channel emulator and compared to the simulated results.Item Receiver Implementation for MIMO-OFDM with AMC and Precoding(IEEE, 2009-11-01) Ketonen, Johanna; Juntti, Markku; Cavallaro, Joseph R.; Center for Multimedia CommunicationReceivers for horizontally encoded LTE based MIMO-OFDM systems are considered in this paper. Adaptive modulation and coding (AMC) is used as well as precoding. The linear minimum mean square error (LMMSE), successive interference cancellation (SIC) and K-best list sphere detectors (LSD) are compared. The receivers were designed and implemented for 2×2 and 4×4 antenna systems and meet the decoding rate requirement in LTE, i.e, 210 Mb/s in 2×2 and 405 Mb/s in 4×4 antenna systems. The results show that the performance of the receivers is similar in low SNR but the performance difference increases when a higher rank transmission is used. The K-best LSD has the highest performance and complexity. A simpler receiver could be used in the low SNRs to save power and a more complex receiver in the high SNRs when a higher goodput is needed.