Browsing by Author "Shen, Hao"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Generalized method to design phase masks for 3D super-resolution microscopy(Optical Society of America, 2019) Wang, Wenxiao; Ye, Fan; Shen, Hao; Moringo, Nicholas A.; Dutta, Chayan; Robinson, Jacob T.; Landes, Christy F.Point spread function (PSF) engineering by phase modulation is a novel approach to three-dimensional (3D) super-resolution microscopy, with different point spread functions being proposed for specific applications. It is often not easy to achieve the desired shape of engineered point spread functions because it is challenging to determine the correct phase mask. Additionally, a phase mask can either encode 3D space information or additional time information, but not both simultaneously. A robust algorithm for recovering a phase mask to generate arbitrary point spread functions is needed. In this work, a generalized phase mask design method is introduced by performing an optimization. A stochastic gradient descent algorithm and a Gauss-Newton algorithm are developed and compared for their ability to recover the phase masks for previously reported point spread functions. The new Gauss-Newton algorithm converges to a minimum at much higher speeds. This algorithm is used to design a novel stretching-lobe phase mask to encode temporal and 3D spatial information simultaneously. The stretching-lobe phase mask and other masks are fabricated in-house for proof-of-concept using multi-level light lithography and an optimized commercially sourced stretching-lobe phase mask (PM) is validated experimentally to encode 3D spatial and temporal information. The algorithms’ generalizability is further demonstrated by generating a phase mask that comprises four different letters at different depths.Item Generalized recovery algorithm for 3D super-resolution microscopy using rotating point spread functions(Springer Nature, 2016) Shuang, Bo; Wang, Wenxiao; Shen, Hao; Tauzin, Lawrence J.; Flatebo, Charlotte; Chen, Jianbo; Moringo, Nicholas A.; Bishop, Logan D.C.; Kelly, Kevin F.; Landes, Christy F.Super-resolution microscopy with phase masks is a promising technique for 3D imaging and tracking. Due to the complexity of the resultant point spread functions, generalized recovery algorithms are still missing. We introduce a 3D super-resolution recovery algorithm that works for a variety of phase masks generating 3D point spread functions. A fast deconvolution process generates initial guesses, which are further refined by least squares fitting. Overfitting is suppressed using a machine learning determined threshold. Preliminary results on experimental data show that our algorithm can be used to super-localize 3D adsorption events within a porous polymer film and is useful for evaluating potential phase masks. Finally, we demonstrate that parallel computation on graphics processing units can reduce the processing time required for 3D recovery. Simulations reveal that, through desktop parallelization, the ultimate limit of real-time processing is possible. Our program is the first open source recovery program for generalized 3D recovery using rotating point spread functions.Item Highly Scalable On-the-Fly Interleaved Address Generation for UMTS/HSPA+ Parallel Turbo Decoder(24th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2013-06-01) Vosoughi, Aida; Wang, Guohui; Shen, Hao; Cavallaro, Joseph R.; Guo, Yuanbin; CMCHigh throughput parallel interleaver design is a major challenge in designing parallel turbo decoders that conform to high data rate requirements of advanced standards such as HSPA+. The hardware complexity of the HSPA+ interleaver makes it difficult to scale to high degrees of parallelism. We propose a novel algorithm and architecture for on-the-fly parallel interleaved address generation in UMTS/HSPA+ standard that is highly scalable. Our proposed algorithm generates an interleaved memory address from an original input address without building the complete interleaving pattern or storing it; the generated interleaved address can be used directly for interleaved writing to memory blocks. We use an extended Euclidean algorithm for modular multiplicative inversion as a step towards reversed intra-row permutations in UMTS/HSPA+ standard. As a result, we can determine interleaved addresses from original addresses. We also propose an efficient and scalable hardware architecture for our method. Our design generates 32 interleaved addresses in one cycle and satisfies the data rate requirement of 672 Mbps in HSPA+ while the silicon area and frequency is improved compared to recent related works.Item Parallel Interleaver Architecture with New Scheduling Scheme for High Throughput Configurable Turbo Decoder(IEEE, 2013-05) Wang, Guohui; Vosoughi, Aida; Shen, Hao; Cavallaro, Joseph R.; Guo, YuanbinParallel architecture is required for high throughput turbo decoder to meet the data rate requirements of the emerging wireless communication systems. However, due to the severe memory conflict problem caused by parallel architectures, the interleaver design has become a major challenge that limits the achievable throughput. Moreover, the high complexity of the interleaver algorithm makes the parallel interleaving address generation hardware very difficult to implement. In this paper, we propose a parallel interleaver architecture that can generate multiple interleaving addresses on-the-fly. We devised a novel scheduling scheme with which we can use more efficient buffer structures to eliminate memory contention. The synthesis results show that the proposed architecture with the new scheduling scheme can significantly reduce memory usage and hardware complexity. The proposed architecture also shows great flexibility and scalability compared to prior work.Item Parallel Nonbinary LDPC Decoding on GPU(IEEE, 2012-12-01) Wang, Guohui; Shen, Hao; Yin, Bei; Wu, Michael; Sun, Yang; Cavallaro, Joseph R.Nonbinary Low-Density Parity-Check (LDPC) codes are a class of error-correcting codes constructed over the Galois field GF(q) for q > 2. As extensions of binary LDPC codes, nonbinary LDPC codes can provide better error-correcting performance when the code length is short or moderate, but at a cost of higher decoding complexity. This paper proposes a massively parallel implementation of a nonbinary LDPC decoding accelerator based on a graphics processing unit (GPU) to achieve both great flexibility and scalability. The implementation maps the Min-Max decoding algorithm to GPU’s massively parallel architecture. We highlight the methodology to partition the decoding task to a heterogeneous platform consisting of the CPU and GPU. The experimental results show that our GPUbased implementation can achieve high throughput while still providing great flexibility and scalability.Item Parallel Searching-Based Sphere Detector for MIMO Downlink OFDM Systems(IEEE, 2012-06-01) Radosavljevic, Predrag; Kim, Kyeong Jin; Shen, Hao; Cavallaro, Joseph R.; Center for Multimedia CommunicationIn this paper, implementation of a detector with parallel partial candidate-search algorithm is described. Two fully independent partial candidate search processes are simultaneously employed for two groups of transmit antennas based on QR decomposition (QRD) and QL decomposition (QLD) of a multiple-input multiple-output (MIMO) channel matrix. By using separate simultaneous candidate searching processes, the proposed implementation of QRD-QLD searching-based sphere detector provides a smaller latency and a lower computational complexity than the original QRD-M detector for similar error-rate performance in wireless communications systems employing four transmit and four receive antennas with 16-QAM or 64-QAM constellation size. It is shown that in coded MIMO orthogonal frequency division multiplexing (MIMO OFDM) systems, the detection latency and computational complexity of a receiver can be substantially reduced by using the proposed QRD-QLD detector implementation. The QRD-QLD-based sphere detector is also implemented using Field Programmable Gate Array (FPGA) and application specific integrated circuit (ASIC), and its hardware design complexity is compared with that of other sphere detectors reported in the literature.