Center for Research Computing

Permanent URI for this collection

https://hdl.handle.net/1911/114452

The Center for Research Computing (CRC) supports computational work by Rice faculty, staff, and student researchers. In cases where the lead author deems these contributions to merit an explicit acknowledgement in the paper or dataset, or the lead author is CRC staff, that item is manually added to this collection (in addition to any other collections it may already belong to).

Browse

Now showing 1 - 5 of 5

Efficient optimization of memory accesses in parallel programs
(2010) Barik, Rajkishore; Sarkar, Vivek
The power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges. The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models. The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible. Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor. With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures.
From high-level tasks to low-level motions: Motion planning for high-dimensional nonlinear hybrid robotic systems
(2008) Plaku, Erion; Kavraki, Lydia E.
A significant challenge of autonomous robotics in transportation, exploration, and search-and-rescue missions lies in the area of motion planning. The overall objective is to enable robots to automatically plan the low-level motions needed to accomplish assigned high-level tasks. Toward this goal, this thesis proposes a novel multi-layered approach, termed Synergic Combination of Layers of Planning ( SyCLoP ), that synergically combines high-level discrete planning and low-level motion planning. High-level discrete planning, which draws from research in AI and logic, guides low-level motion planning during the search for a solution. Information gathered during the search is in turn fed back from the low-level to the high-level layer in order to improve the high-level plan in the next iteration. In this way, high-level plans become increasingly useful in guiding the low-level motion planner toward a solution. This synergic combination of high-level discrete planning and low-level motion planning allows SyCLoP to solve motion-planning problems with respect to rich models of the robot and the physical world. This facilitates the design of feedback controllers that enable the robot to execute in the physical world solutions obtained in simulation. In particular, SyCLoP effectively solves challenging motion-planning problems that incorporate robot dynamics, physics-based simulations, and hybrid systems. Hybrid systems move beyond continuous models by employing discrete logic to instantaneously modify the underlying robot dynamics to respond to mishaps or unanticipated changes in the environment. Experiments in this thesis show that SyCLoP obtains significant computational speedup of one to two orders of magnitude when compared to state-of-the-art motion planners. In addition to planning motions that allow the robot to reach a desired destination while avoiding collisions, SyCLoP can take into account high-level tasks specified using the expressiveness of linear temporal logic (LTL). LTL allows for complex specifications, such as sequencing, coverage, and other combinations of temporal objectives. Going beyond motion planning, SyCLoP also provides a useful framework for discovering violations of safety properties in hybrid systems.
Grid-centric scheduling strategies for workflow applications
(2010) Zhang, Yang; Cooper, Keith D.
Grid computing faces a great challenge because the resources are not localized, but distributed, heterogeneous and dynamic. Thus, it is essential to provide a set of programming tools that execute an application on the Grid resources with as little input from the user as possible. The thesis of this work is that Grid-centric scheduling techniques of workflow applications can provide good usability of the Grid environment by reliably executing the application on a large scale distributed system with good performance. We support our thesis with new and effective approaches in the following five aspects. First, we modeled the performance of the existing scheduling approaches in a multi-cluster Grid environment. We implemented several widely-used scheduling algorithms and identified the best candidate. The study further introduced a new measurement, based on our experiments, which can improve the schedule quality of some scheduling algorithms as much as 20 fold in a multi-cluster Grid environment. Second, we studied the scalability of the existing Grid scheduling algorithms. To deal with Grid systems consisting of hundreds of thousands of resources, we designed and implemented a novel approach that performs explicit resource selection decoupled from scheduling Our experimental evaluation confirmed that our decoupled approach can be scalable in such an environment without sacrificing the quality of the schedule by more than 10%. Third, we proposed solutions to address the dynamic nature of Grid computing with a new cluster-based hybrid scheduling mechanism. Our experimental results collected from real executions on production clusters demonstrated that this approach produces programs running 30% to 100% faster than the other scheduling approaches we implemented on both reserved and shared resources. Fourth, we improved the reliability of Grid computing by incorporating fault- tolerance and recovery mechanisms into the workow application execution. Our experiments on a simulated multi-cluster Grid environment demonstrated the effectiveness of our approach and also characterized the three-way trade-off between reliability, performance and resource usage when executing a workflow application. Finally, we improved the large batch-queue wait time often found in production Grid clusters. We developed a novel approach to partition the workow application and submit them judiciously to achieve less total batch-queue wait time. The experimental results derived from production site batch queue logs show that our approach can reduce total wait time by as much as 70%. Our approaches combined can greatly improve the usability of Grid computing while increasing the performance of workow applications on a multi-cluster Grid environment.
Mie and Finite-Element Simulations of the Optical and Plasmonic Properties of Micro- and Nanostructures
(2012) Hu, Ying Samuel; Drezek, Rebekah A.
A Mie-based code is developed for multilayer concentric spheres. The code is used in conjunction with a finite-element package to investigate the plasmonic and optical properties of micro- and nanostructures. For plasmonic nanostructures, gold-silica-gold multilayer nanoshells are computationally investigated. A plasmon hybridization theory is used to interpret the optical tunability. The interaction between the plasmon modes on the inner core and the outer shell results in dual resonances. The low-energy dipole mode is red-shifted by reducing the spacing ( i.e. , the intermediate silica layer) between the core and the shell. This extra tunability allows the plasmon resonance of a multilayer nanoshell to be tuned to the near-infrared region from a visible silica-gold nanoshell whose gold shell cannot be further reduced in thickness. For multilayer nanoshells with reduced geometrical symmetry ( i.e. , the inner core is offset from the center), modes of different orders interact. The mixed interaction introduces the dipolar (bright) characteristic into the higher-order (dark) modes and improves their coupling efficiency to the excitation light. The excitation of the dark modes attenuates and red-shifts the dipole mode and gives it higher-order characteristics. For non-plasmonic structures, simulations have demonstrated that multilayered structures can either reduce or enhance the scattering of light. By adding an anti-reflection layer to as microsphere made of a high-index material, the scattering force can be dramatically reduced. The reduced scattering allows optical trapping of high-index particles. Additionally, the improved trapping is not largely sensitive to the refractive index or the thickness of the coating. The technique has the practical potential to lower the requirement on the numerical aperture of the microscope objectives, making possible the integration of the imaging and optical trapping systems. While the anti-reflection coating reduces scattering, the photothermal bubble (PTB) generated by gold nanoparticles by and large enhances the scattering of light. Transient PTBs are generated by super-heating gold nanoparticles with short laser pulses. Mie-based simulations predict that the scattering of PTBs strongly depends on the transient environment immediately surrounding the nanoparticles. A scattering enhancement of two-to-four orders of magnitude from PBT is demonstrated from both calculations and experiments. Lastly, the near-field coupling between different plasmonie structures for surface-enhanced Raman scattering is investigated. A gold-coated silicon-germanium nanocone substrate has been fabricated and characterized. Finite-element simulations reveal that individual nanocones generate strong tip enhancement with axially polarized light ( i.e. , light polarized along the vertical axis of the nanocone) while the enhancement from transversely polarized light ( i.e. , light polarized in the plane of the substrate) is relatively weak. By simply filling the valleys between nanocones with plasmonic gold nanoparticles, the performance of the substrate is improved with in-plane excitation. Simulations reveal strong coupling between nanoparticles and adjacent nanocones with transverse exactions. An over one order-of-magnitude improvement has been experimentally observed.
Molecules in motion: Computing structural flexibility
(2008) Shehu, Amarda; Kavraki, Lydia E.
Growing databases of protein sequences in the post-genomic era call for computational methods to extract structure and function from a protein sequence. In flexible molecules like proteins, function cannot be reliably extracted from a few structures. The amino-acid chain assumes various spatial arrangements (conformations) to modulate biological function. Characterizing the flexibility of a protein under physiological (native) conditions remains an open problem in computational biology. This thesis addresses the problem of characterizing the native flexibility of a protein by computing conformations populated under native conditions. Such computation involves locating free-energy minima in a high-dimensional conformational space. The methods proposed in this thesis search for native conformations using systematically less information from experiment: first employing an experimental structure, then using only a closure constraint in cyclic cysteine-rich peptides, and finally employing only the amino-acid sequence of small- to medium-size proteins. A novel method is proposed to compute structural fluctuations of a protein around an experimental structure. The method combines a robotics-inspired exploration of the conformational space with a statistical mechanics formulation. Thermodynamic quantities measured over generated conformations reproduce experimental data of broad time scales on small (∼ 100 amino acids) proteins with non-concerted motions. Capturing concerted motions motivates the development of the next methods. A second method is proposed that employs a closure constraint to generate native conformations of cyclic cysteine-rich peptides. The method first explores the entire conformational space, then explores in present energy minima until no lower-energy minima emerge. The method captures relevant features of the native state also observed in experiment for 20–30 amino-acid long peptides. A final method is proposed that implements a similar exploration but for longer proteins and employing only amino-acid sequence. In its first stage, the method explores the entire conformational space at a coarse-grained level of detail. A second stage focuses the exploration to low-energy regions in more detail. All-atom conformational ensembles are obtained for proteins that populate various functional states through large-scale concerted motions. These ensembles capture well the populated functional states of proteins up to 214 amino-acids long.

Browse

Browsing Center for Research Computing by Subject "Applied sciences"

Results Per Page

Sort Options