Rice Wireless

Permanent URI for this collection

Formerly the Center for Multimedia Communications, Rice Wireless is part of the university's Electrical and Computer Engineering Department. More information about the group can be found at http://wireless.rice.edu/.


Recent Submissions

Now showing 1 - 20 of 268
  • Item
    Dataflow Modeling and Design for Cognitive Radio Networks
    (8th International Conference on Cognitive Radio Oriented Wireless Networks, 2013-10-01) Wang, Lai-Huei; Bhattacharyya, Shuvra S.; Vosoughi, Aida; Cavallaro, Joseph R.; Juntti, Markku; Boutellier, Jani; Silven, Olli; Valkama, Mikko; CMC
    Cognitive radio networks present challenges at many levels of design including configuration, control, and crosslayer optimization. In this paper, we focus primarily on dataflow representations to enable flexibility and reconfigurability in many of the baseband algorithms. Dataflow modeling will be important to provide a layer of abstraction and will be applied to generate flexible baseband representations for cognitive radio testbeds, including the Rice WARP platform. As RF frequency agility and reconfiguration for carrier aggregation are important goals for 4G LTE Advanced systems, we also focus on dataflow analysis for digital pre-distortion algorithms. A new design method called parameterized multidimensional design hierarchy mapping(PMDHM) is presented, along with initial speedup results from applying PMDHM in the mapping of channel estimation onto a GPU architecture.
  • Item
    Highly Scalable On-the-Fly Interleaved Address Generation for UMTS/HSPA+ Parallel Turbo Decoder
    (24th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2013-06-01) Vosoughi, Aida; Wang, Guohui; Shen, Hao; Cavallaro, Joseph R.; Guo, Yuanbin; CMC
    High throughput parallel interleaver design is a major challenge in designing parallel turbo decoders that conform to high data rate requirements of advanced standards such as HSPA+. The hardware complexity of the HSPA+ interleaver makes it difficult to scale to high degrees of parallelism. We propose a novel algorithm and architecture for on-the-fly parallel interleaved address generation in UMTS/HSPA+ standard that is highly scalable. Our proposed algorithm generates an interleaved memory address from an original input address without building the complete interleaving pattern or storing it; the generated interleaved address can be used directly for interleaved writing to memory blocks. We use an extended Euclidean algorithm for modular multiplicative inversion as a step towards reversed intra-row permutations in UMTS/HSPA+ standard. As a result, we can determine interleaved addresses from original addresses. We also propose an efficient and scalable hardware architecture for our method. Our design generates 32 interleaved addresses in one cycle and satisfies the data rate requirement of 672 Mbps in HSPA+ while the silicon area and frequency is improved compared to recent related works.
  • Item
    Decision-Directed Channel Estimation Implementation for Spectral Efficiency Improvement in Mobile MIMO-OFDM
    (Springer, 2015) Ketonen, Johanna; Juntti, Markku; Ylioinas, Jari; Cavallaro, Joseph R.
    Channel estimation algorithms and their implementations for mobile receivers are considered in this paper. The 3GPP long term evolution (LTE) based pilot structure is used as a benchmark in a multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) receiver. The decision directed (DD) space alternating generalized expectation-maximization (SAGE) algorithm is used to improve the performance from that of the pilot symbol based least-squares (LS) channel estimator. The performance is improved with high user velocities, where the pilot symbol density is not sufficient. Minimum mean square error (MMSE) filtering is also used in estimating the channel in between pilot symbols. The pilot overhead can be reduced to a third of the LTE pilot overhead with DD channel estimation, obtaining a ten percent increase in data throughput. Complexity reduction and latency issues are considered in the architecture design. The pilot based LS, MMSE and the SAGE channel estimators are implemented with a high level synthesis tool, synthesized with the UMC 0.18 μm CMOS technology and the performance-complexity trade-offs are studied. The MMSE estimator improves the performance from the simple LS estimator with LTE pilot structure and has low power consumption. The SAGE estimator has high power consumption but can be used with reduced pilot density to increase the data rate.
  • Item
    GPU-based Acceleration of Symbol Timng Recovery
    (IEEE, 2012-12-20) Kim, Scott C.; Plishker, William L.; Bhattacharyya, Shuvra S.; Cavallaro, Joseph R.; CMC
    This paper presents a novel implementation of graphics processing unit (GPU) based symbol timing recovery using polyphase interpolators to detect symbol timing error. Symbol timing recovery is a compute intensive procedure that detects and corrects the timing error in a coherent receiver. We provide optimal sample-time timing recovery using a maximum likelihood (ML) estimator to minimize the timing error. This is an iterative and adaptive system that relies on feedback, therefore, we present an accelerated implementation design by using a GPU for timing error detection (TED), enabling fast error detection by exploiting the 2D filter structure found in the polyphase interpolator. We present this hybrid/ heterogeneous CPU and GPU architecture by computing a low complexity and low noise matched filter (MF) while simultaneously performing TED. We then compare the performance of the CPU vs. GPU based timing recovery for different interpolation rates to minimize the error and improve the detection by up to a factor of 35. We further improve the process by utilizing GPU optimization and performing block processing to improve the throughput even more, all while maintaining the lowest possible sampling rate.
  • Item
    LTE uplink MIMO receiver with low complexity interference cancellation
    (Springer, 2012-11-01) Yin, Bei; Cavallaro, Joseph R.; CMC
    In LTE/LTE-A uplink receiver, frequency domain equalizers (FDE) are adopted to achieve good performance. However, in multi-tap channels, the residual inter-symbol and inter-antenna interference still exist after FDE and degrade the performance. Conventional interference cancellation schemes can minimize this interference by using frequency domain interference cancellation. However, those schemes have high complexity and large feedback latency, especially when adopting a large number of iterations. These result in low throughput and require a large amount of resource in software defined radio implementation. In this paper, we propose a novel low complexity interference cancellation scheme to minimize the residual interference in LTE/LTE-A uplink. Our proposed scheme can bring about 2 dB gains in different channels, but only adds up to 7.2 % complexity to the receiver. The scheme is further implemented on Xilinx FPGA. Compared to other conventional interference cancellation schemes, our scheme has less complexity, less data to store, and shorter feedback latency.
  • Item
    Low Complexity Opportunistic Decoder for Network Coding
    (IEEE, 2012-12-01) Yin, Bei; Wu, Michael; Wang, Guohui; Cavallaro, Joseph R.; CMC
    In this paper, we propose a novel opportunistic decoding scheme for network coding decoder which significantly reduces the decoder complexity and increases the throughput. Network coding was proposed to improve the network throughput and reliability, especially for multicast transmissions. Although network coding increases the network performance, the complexity of the network coding decoder algorithm is still high, especially for higher dimensional finite fields or larger network codes. Different software and hardware approaches were proposed to accelerate the decoding algorithm, but the decoder remains to be the bottleneck for high speed data transmission. We propose a novel decoding scheme which exploits the structure of the network coding matrix to reduce the network decoder complexity and improve throughput. We also implemented the proposed scheme on Virtex 7 FPGA and compared our implementation to the widely used Gaussian elimination.
  • Item
    Parallel Nonbinary LDPC Decoding on GPU
    (IEEE, 2012-12-01) Wang, Guohui; Shen, Hao; Yin, Bei; Wu, Michael; Sun, Yang; Cavallaro, Joseph R.
    Nonbinary Low-Density Parity-Check (LDPC) codes are a class of error-correcting codes constructed over the Galois field GF(q) for q > 2. As extensions of binary LDPC codes, nonbinary LDPC codes can provide better error-correcting performance when the code length is short or moderate, but at a cost of higher decoding complexity. This paper proposes a massively parallel implementation of a nonbinary LDPC decoding accelerator based on a graphics processing unit (GPU) to achieve both great flexibility and scalability. The implementation maps the Min-Max decoding algorithm to GPU’s massively parallel architecture. We highlight the methodology to partition the decoding task to a heterogeneous platform consisting of the CPU and GPU. The experimental results show that our GPUbased implementation can achieve high throughput while still providing great flexibility and scalability.
  • Item
    Implementation of LS, MMSE and SAGE Channel Estimators for Mobile MIMO-OFDM
    (IEEE, 2012-12-01) Ketonen, Johanna; Juntti, Markku; Ylioinas, Jari; Cavallaro, Joseph R.; CMC
    The use of decision directed (DD) channel estimation in a multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) downlink receiver is studied in this paper. The 3GPP long term evolution (LTE) based pilot structure is used as a benchmark. The space-alternating generalized expectation-maximization (SAGE) algorithm is used to improve the performance from that of the pilot symbol based least-squares (LS) channel estimator. The DD channel estimation improves the performance with high user velocities, where the pilot symbol density is not sufficient. Minimum mean square error (MMSE) filtering can also be used in estimating the channel in between pilot symbols. The DD channel estimation can be used to reduce the pilot overhead without any performance degradation by transmitting data instead of pilot symbols. The pilot overhead is reduced to a third of the LTE pilot overhead, obtaining a ten percent increase in throughput. The pilot based LS, MMSE and the SAGE channel estimators are implemented and the performance-complexity trade-offs are studied.
  • Item
    Flexible N-Way MIMO Detector on GPU
    (IEEE Computer Society, 2012-10-17) Wu, Michael; Yin, Bei; Cavallaro, Joseph R.; CMC
    This paper proposes a flexible Multiple-Input Multiple-Output (MIMO) detector on graphics processing units (GPU). MIMO detection is a key technology in broadband wireless system such as LTE,WiMAX, and 802.11n. Existing detectors either use costly sorting for better performance or sacrifice sorting for higher throughput. To achieve good performance with high thoughput, our detector runs multiple search passes in parallel, where each search pass detects the transmit stream with a different permuted detection order. We show that this flexible detector, including QR decomposition preprocessing, outperforms existing GPU MIMO detectors while maintaining good bit error rate (BER) performance. In addition, this detector can achieve different tradeoffs between throughput and accuracy by changing the number of parallel search passes.
  • Item
    FPGA in Wireless Communications Applications
    (Elsevier, Waltham, MA, 2012-07-12) Amiri, Kiarash; Duarte, Melissa; Cavallaro, Joseph R.; Dick, Chris; Rao, Raghu; Sabharwal, Ashutosh; Center for Multimedia Communication
    In the past decade we have witnessed explosive growth in the wireless communications industry with over 4 billion subscribers worldwide. While first and second generation systems focused on voice communications, third generation networks (3GPP and 3GPP2) embraced code division multiple access (CDMA) and had a strong focus on enabling wireless data services. As we reflect on the rollout of 3G services, the reality is that first generation 3G systems did not entirely fulfill the promise of high-speed transmission, and the rates supported in practice were much lower than those claimed in the standards. Enhanced 3G systems were subsequently deployed to address the deficiencies. However, the data rate capabilities and network architecture of these systems were insufficient to address the insatiable consumer and business sector demand for the nomadic delivery of media and datacentric services to an increasingly rich set of mobile platforms.
  • Item
    High-Level Design Tools for Complex DSP Applications
    (Elsevier, Waltham, MA, 2012-07-12) Sun, Yang; Amiri, Kiarash; Wang, Guohui; Yin, Bei; Cavallaro, Joseph R.; Ly, Tai; Center for Multimedia Communication
    High-level synthesis design methodology - High level synthesis (HLS) [1], also known as behavioral synthesis and algorithmic synthesis, is a design process in which a high level, functional description of a design is automatically compiled into a RTL implementation that meets certain user specified design constraints. The HLS design description is ‘high level’ compared to RTL in two aspects: design abstraction, and specification language.
  • Item
    Low complexity scalable MIMO sphere detection through antenna detection reordering
    (Springer, 2012-07-01) Wu, Michael; Dick, Chris; Sun, Yang; Cavallaro, Joseph R.; Center for Multimedia Communication
    This paper describes a novel low complexity scalable multiple-input multiple-output (MIMO) detector that does not require preprocessing and the optimal squared l2-norm computations to achieve good bit error (BER) performance. Unlike existing detectors such as Flexsphere that use preprocessing before MIMO detection to improve performance, the proposed detector instead performs multiple search passes, where each search pass detects the transmit stream with a different permuted detection order. In addition, to reduce the number of multipliers required in the design, we use l1-norm in place of the optimal squared l2-norm. To ameliorate the BER performance loss due to l1- norm, we propose squaring then scaling the l1-norm. By changing the number of parallel search passes and using norm scaling, we show that this design achieves comparable performance to Flexsphere with reduced resource requirement or achieves BER performance close to exhaustive search with increased resource requirement.
  • Item
    Nonlinear Fault Detection for Hydraulics: Recent Advances in Fault Diagnosis and Fault Tolerance for Mechatronic Systems
    (2002-10-01) Leuschen, Martin L.; Walker, Ian D.; Cavallaro, Joseph R.; Center for Multimedia Communication
    One of the most important areas in the robotics industry is the development of robots capable of working in hazardous environments. As humans cannot safely or cheaply work in these environments, providing a high level of robotic functionality is important. Our work in this area focuses on a fault detection method known as analytical redundancy, or AR. In this paper we discuss the application to a hydraulic servovalve system of our novel rigorous nonlinear AR technique. AR is a model-based state-space technique that is theoretically guaranteed to derive the maximum number of independent tests of the consistency of sensor data with the system model and past control inputs. Conventional linear AR is only valid for linear sampled data systems. However, our new nonlinear AR (NLAR) technique maintains traditional linear AR’s mathematical guarantee to generate the maximum possible number of independent tests in the nonlinear domain. Thus NLAR allows us to gain the benefits of AR testing for nonlinear systems with both continuous and sampled data.
  • Item
    Fault Detection and Fault Tolerance in Robotics
    (1991-07-01) Visinsky, Monica L.; Walker, Ian D.; Cavallaro, Joseph R.; Center for Multimedia Communication
    Robots are used in inaccessible or hazardous environments in order to alleviate some of the time, cost and risk involved in preparing men to endure these conditions. In order to perform their expected tasks, the robots are often quite complex, thus increasing their potential for failures. If men must be sent into these environments to repair each component failure in the robot, the advantages of using the robot are quickly lost. Fault tolerant robots are needed which can effectively cope with failures and continue their tasks until repairs can be realistically scheduled. Before fault tolerant capabilities can be created, methods of detecting and pinpointing failures must be perfected. This paper develops a basic fault tree analysis of a robot in order to obtain a better understanding of where failures can occur and how they contribute to other failures in the robot. The resulting failure flow chart can also be used to analyze the resiliency of the robot in the presence of specific faults. By simulating robot failures and fault detection schemes, the problems involved in detecting failures for robots are explored in more depth. Future work will extend the analyses done in this paper to enhance Trick, a robotic simulation testbed, with fault tolerant capabilities in an expert system package.
  • Item
    GPU Accelerated Scalable Parallel Decoding of LDPC Codes
    (IEEE, 2011-11-01) Wang, Guohui; Wu, Michael; Sun, Yang; Center for Multimedia Communication
    This paper proposes a flexible low-density parity-check (LDPC) decoder which leverages graphic processor units (GPU) to provide high decoding throughput. LDPC codes are widely adopted by the new emerging standards for wireless communication systems and storage applications due to their near-capacity error correcting performance. To achieve high decoding throughput on GPU, we leverage the parallelism embedded in the check-node computation and variable-node computation and propose a parallel strategy of partitioning the decoding jobs among multi-processors in GPU. In addition, we propose a scalable multi-codeword decoding scheme to fully utilize the computation resources of GPU. Furthermore, we developed a novel adaptive performance-tuning method to make our decoder implementation more flexible and scalable. The experimental results show that our LDPC decoder is scalable and flexible, and the adaptive performance-tuning method can deliver the peak performance based on the GPU architecture.
  • Item
    The Use of Fault Trees for the Design of Robots for Hazardous Environments
    (IEEE, 1996-01-01) Walker, Ian D.; Cavallaro, Joseph R.; Center for Multimedia Communication
    This paper addresses the application of fault trees to the analysis of robot manipulator reliability and fault tolerance. Although a common and useful tool in other applications, fault trees have only recently been applied to robots. In addition, most of the fault tree analyses in robotics have focused on qualitative, rather than quantitative, analysis. Robotic manipulators present some special problems, due to the complex and strongly coupled nature of their subsystems, and also their wild response to subsystem failures. Additionally, there is a lack of reliability data for robots and their subsystems. There has traditionally been little emphasis on fault tolerance in the design of industrial robots, and data regarding operational robot failures is relatively scarce.
  • Item
    Maximum Likelihood Multipath Channel Parameter Estimation in CDMA Systems
    (1998-03-01) Sengupta, Chaitali; Hottinen, Ari; Cavallaro, Joseph R.; Aazhang, Behnaam; Center for Multimedia Communication
    The problem addressed in this paper is the estimation of the channel parameters in a Code Division Multiple Access(CDMA) communication system, in the presence of multipath effects. Maximum likelihood estimation of these parameters has been investigated in the past with the main drawback being the complexity of the multi-dimensional algorithms. The algorithm presented in this paper elegantly decomposes the multiuser problem into a series of single user problems. The algorithm first estimates a composite channel impulse response of each user and then extracts the channel parameters of all the paths of each user from the channel impulse response. We evaluate the performance of the algorithm through simulation studies.
  • Item
    Robot Reliability Through Fuzzy Markov Models
    (IEEE, 1998-01-01) Leuschen, Martin L.; Walker, Ian D.; Cavallaro, Joseph R.; Center for Multimedia Communication
    In the past few years, new applications of robots have increased the importance of robotic reliability and fault tolerance. Standard approaches of reliability engineering rely on the probability model, which is often inappropriate for this task due to a lack of sufficient probabilistic information during the design and prototyping phases. Fuzzy logic offers an alternative to the probability paradigm, possibility, that is much more appropriate to reliability in the robotic context.
  • Item
    Maximum Weight Basis Decoding of Convolutional Codes
    (IEEE, 2002-11-01) Das, Suman; Erkip, Elza; Cavallaro, Joseph R.; Aazhang, Behnaam; Center for Multimedia Communication
    In this paper we describe a new suboptimal decoding technique for linear codes based on the calculation of maximum weight basis of the code. The idea is based on estimating the maximum number locations in a codeword which have least probability of estimation error without violating the codeword structure. In this paper we discuss the details of the algorithm for a convolutional code. The error correcting capability of the convolutional code increases with the constraint length of the code. Unfortunately the decoding complexity of Viterbi algorithm grows exponentially with the constraint length. We also augment the maximal weight basis algorithm by incorporating the ideas of list decoding technique. The complexity of the algorithm grows only quadratically with the constraint length and the performance of the algorithm is comparable to the optimal Viterbi decoding method.
  • Item
    Robotic Fault Detection Using Nonlinear Analytical Redundancy
    (IEEE, 2002-05-01) Leuschen, Martin L.; Cavallaro, Joseph R.; Walker, Ian D.; Center for Multimedia Communication
    In this paper we discuss the application of our recently developed nonlinear analytical redundancy (NLAR) fault detection technique to a two-degree of freedom robot manipulator. NLAR extends the traditional linear AR technique to derive the maximum possible number of fault detection tests into the continuous nonlinear domain. The ability to handle nonlinear systems vastly expands the accuracy and viable applications of the AR technique. The effectiveness of the approach is demonstrated through an example.