R-3 Repository :: Browsing by Author "Varman, Peter J."

Browsing by Author "Varman, Peter J."

Now showing 1 - 20 of 43

Analysis of simple randomized buffer management for parallel I/O
(Elsevier, 2004-04) Kallahalla, Mahesh; Varman, Peter J.; Electrical and Computer Engineering; Varman Laboratory
Buffer management for a D-disk parallel I/O system is considered in the context of randomized placement of data on the disks. A simple prefetching and caching algorithm PHASE-LRU using bounded lookahead is described and analyzed. It is shown that PHASE-LRU performs an expected number of I/Os that is within a factor (logD/log logD) of the number performed by an optimal off-line algorithm. In contrast, any deterministic buffer
ASP: Adaptive Online Parallel Disk Scheduling
(1999) Kallahalla, Mahesh; Varman, Peter J.
In this work we address the problems of prefetching and I/O scheduling for read-once reference strings in a parallel I/O system. We use the standard parallel disk model with D disks a shared I/O bu er of sizeM. We design an on-line algorithm ASP (Adaptive Segmented Prefetching) with ML-block lookahead, L 1, and compare its performance to the best on-line algorithm with the same lookahead. We show that for any reference string the number of I/Os done by ASP is with a factor (C), C = minfpL;D1=3g, of the number of I/Os done by the optimal algorithm with the same amount of lookahead.
Automated Detection and Differential Diagnosis of Non-small Cell Lung Carcinoma Cell Types Using Label-free Molecular Vibrational Imaging
(2012-09-05) Hammoudi, Ahmad; Varman, Peter J.; Massoud, Yehia; Wong, Stephen T. C.; Clark, John W., Jr.; Aazhang, Behnaam
Lung carcinoma is the most prevalent type of cancer in the world, considered to be a relentlessly progressive disease, with dismal mortality rates to patients. Recent advances in targeted therapy hold the premise for the delivery of better, more effective treatments to lung cancer patients, that could significantly enhance their survival rates. Optimizing care delivery through targeted therapies requires the ability to effectively identify and diagnose lung cancer along with identifying the lung cancer cell type specific to each patient, $$\textit{small cell carcinoma}$$, $$\textit{adenocarcinoma}$$, or $$\textit{squamous cell carcinoma}$$. Label free optical imaging techniques such as the $$\textit{Coherent anti-stokes Raman Scattering microscopy}$$ have the potential to provide physicians with minimally invasive access to lung tumor sites, and thus allow for better cancer diagnosis and sub-typing. To maximize the benefits of such novel imaging techniques in enhancing cancer treatment, the development of new data analysis methods that can rapidly and accurately analyze the new types of data provided through them is essential. Recent studies have gone a long way to achieving those goals but still face some significant bottlenecks hindering the ability to fully exploit the diagnostic potential of CARS images, namely, the streamlining of the diagnosis process was hindered by the lack of ability to automatically detect cancer cells, and the inability to reliably classify them into their respective cell types. More specifically, data analysis methods have thus far been incapable of correctly identifying and differentiating the different non-small cel lung carcinoma cell types, a stringent requirement for optimal therapy delivery. In this study we have addressed the two bottlenecks named above, through designing an image processing framework that is capable of, automatically and accuratly, detecting cancer cells in two and three dimensional CARS images. Moreover, we built upon this capability with a new approach at analyzing the segmented data, that provided significant information about the cancerous tissue and ultimately allowed for the automatic differential classification of non-small cell lung carcinoma cell types, with superb accuracies.
Balancing Fairness and Efficiency in Tiered Storage Systems using Bottleneck-Aware Allocation
(USENIX: The Advanced Computing Systems Association, 2014-02) Varman, Peter J.; Wang, Hui
Multi-tiered storage made up of heterogeneous devices are raising new challenges in allocating throughput fairly among concurrent clients. The fundamental problem is finding an appropriate balance between fairness to the clients and maximizing system utilization. In this paper we cast the problem within the broader framework of fair allocation for multiple resources. We present a new allocation model BAA based on the notion of per-device bottleneck sets. Clients bottlenecked on the same device receive throughputs in proportion to their fair shares, while allocation ratios between clients in different bottleneck sets are chosen to maximize system utilization. We show formally that BAA satisfies fairness properties of Envy Freedom and Sharing Incentive. We evaluated the performance of our method using both simulation and implementation on a Linux platform. The experimental results show that our method can provide both high efficiency and fairness.
Bridging the Programming Gap between Persistent and Volatile Memory using WrAP
(2013-05) Giles, Ellis; Doshi, Kshitij; Varman, Peter J.
Advances in memory technology are promising the availability of byte-addressable persistent memory as an integral component of future computing platforms. This change has significant implications for software that has traditionally made a sharp distinction between durable and volatile storage. In this paper we describe a softwaregardware architecture, WrAP, for persistent memory that provides atomicity and durability while simultaneously ensuring that fast paths through the cache, DRAM, and persistent memory layers are not slowed down by burdensome buffering or double-copying requirements. Trace-driven simulation of transactional data structures indicate the potential for significant performance gains using the WrAP approach.
Classification Techniques for Undersampled Electromyography and Electrocardiography
(2012-10-01) Wilhelm, Keith; Varman, Peter J.; Massoud, Yehia; Clark, John W., Jr.; Koushanfar, Farinaz
Electrophysiological signals including electrocardiography (ECG) and electromyography (EMG) are widely used in clinical environments for monitoring of patients and for diagnosis of conditions including cardiac and neuromuscular disease. Due to the wealth of information contained in these signals, many additional applications would be facilitated by full-time acquisition combined with automated analysis. Recent performance gains in portable computing devices and large scale computing platforms provide the necessary computational resources to process and store this data; however challenges at the sensor level have prevented monitoring systems from reaching the practicality and convenience necessary for widespread, continuous use. In this thesis, we examine the feasibility of applying techniques from the compressive sensing field to the acquisition and analysis of electrophysiological signals. These techniques allow signals to be acquired in compressed form, thereby providing a means to reduce power consumption of monitoring devices. We demonstrate the effects of several methods of compressive sampling and reconstruction on standard compression and reconstruction error metrics. Additionally, we investigate the effects of compressive sensing on the accuracy of automated signal analysis techniques for extracting useful information from ECG and EMG signals.
Communication efficient parallel algorithms for nonnumerical computations
(1988) Doshi, Kshitij Arun; Varman, Peter J.
The broad goal of this research is to develop a set of paradigms for mapping data-dependent symbolic computations on realistic models of parallel architectures. Within this goal, the thesis represents the initial effort to achieve efficient parallel solutions for a number of non-numerical problems on networks of processors. The specific contributions of the thesis are new parallel algorithms, exhibiting linear speedup on architectures consisting of fixed numbers of processors (i.e., bounded models). The following problems have been considered in the thesis: (1) Determine the minimum spanning tree (MST), and identify the bridges and articulation points (APs) of an undirected weighted graph represented by an $n \times n$ adjacency matrix. (2) The pattern matching problem: Given two strings of characters, of lengths $m$ and $n\ ({\geq}m)$ respectively, mark all positions in the second string where there appears an instance of the first string. (3) Sort $n$ elements. For each problem, we use a processor-network consisting of $p$ processors. The network model used in the solution of the first set of problems is the linear array; while that used in the solutions of the second and third problems is a butterfly-connected system. The solutions on the butterfly-connected system apply also on a pipelined hypercube. The performances of the solutions are summarized below. (1) For a graph on $n$ vertices and represented by a distributed adjacency matrix, we present a solution for the MST problem that requires O$(n\sp2/p + n + p)$ time for execution. We present novel data reduction schemes for identifying the bridges and articulation points. (2) The string matching solution requires time O$((n + m)/p + \log\sp2 p),$ where $n$ and $m$ are the lengths of the two strings. No previous parallel solutions achieving linear speedups have been proposed on networks of processors. (3) The execution time requirements of the sorting algorithm are O$(n/p \log n + \log\sp2p),$ which represents a linear speedup up to the use of $n/\log n$ processors. A previous solution achieved linear speedup on a $2\sp{\sqrt{\log n}}$ processor binary-cube. A new parallel merging procedure is presented in the algorithm. Also, as part of the algorithm, a new routing operation called Forward-copy is shown to result in conflict-free communication on the butterfly. (Abstract shortened with permission of author.)
Competitive Parallel Disk Prefetching and Buffer Management1
(Rice University, 2000) Barve, Rakesh; Kallahalla, Mahesh; Varman, Peter J.
We provide a competitive analysis framework for online prefetching and buffer management algorithms in parallel IrO systems, using a read-once model of block references. This has widespread applicability to key IrO-bound applications such as external merging and concurrent playback of multiple video streams. Two realistic lookahead models, global lookahead and local lookahead, are defined. Algorithms NOM and GREED, based on these two forms of lookahead are analyzed for shared buffer and distributed buffer configurations, both of which occur frequently in existing systems. An important aspect of our work is that we show how to implement both of the models of lookahead in practice using the simple techniques of forecasting and flushing. Given a D-disk parallel IrO system and a globally shared IrO buffer that can hold up to M disk blocks, we derive a lower bound of V 'D . on the competitive ratio of any deterministic online prefetching algorithm with O M. lookahead. NOM is shown to match the lower bound using global M-block lookahead. In contrast, using only local lookahead results in an V D. competitive ratio. When the buffer is distributed into D portions of MrD blocks each, the algorithm GREED based on local lookahead is shown to be optimal, and NOM is within a constant factor of optimal. Thus we provide a theoretical basis for the intuition that global lookahead is more valuable for prefetching in the case of a shared buffer configuration, whereas it is enough to provide local lookahead in the case of a distributed configuration. Finally, we analyze the performance of these algorithms for reference strings generated by a uniformly-random stochastic process and we show that they achieve the minimal expected number of IrOs. These results also give bounds on the worst-case expected performance of algorithms which employ randomization in the data layout. 1 A preliminary version of this paper has appeared in the Proceedings of the ACM Fifth Annual Workshop on IrO in Parallel and Distributed Systems. 2 Supported in part by an IBM graduate fellowship. Part of this work was done while the author was visiting Lucent Technologies, Bell Laboratories, Murray Hill, NJ. 3 Supported in part by a grant from the Schlumberger Foundation and by the National Science Foundation under Grant CCR-9704562. 4 Supported in part by the National Science Foundation under Grant CCR-9522047 and by Army Research Office MURI Grant DAAH04-96-1-0013. Part of this work was done while the author was visiting Lucent Technologies, Bell Laboratories, Murray Hill, NJ.
Competitive prefetching and buffer management for parallel I/O systems
(1997) Kallahalla, Mahesh; Varman, Peter J.
In this thesis we study prefetching and buffer management algorithms for parallel I/O systems. Two models of lookahead, global and local, which give limited information regarding future accesses are introduced. Two configurations of the I/O buffer, shared and distributed, are considered, based upon the accessibility of the I/O buffer. The performance of prefetching algorithms using the two forms of lookahead is analyzed in the framework of competitive analysis, for read-once access patterns. Two algorithms, PHASE and GREED, which match the lower bounds are presented. A randomized version of GREED that performs the minimal expected number of I/Os is designed and applied to the problems of external sorting and video retrieval. Finally the problem of designing prefetching and buffer management algorithms for read-many reference strings is examined. An algorithm which uses randomized write-back to attain good expected I/O performance is presented.
Defragmenting the Cloud Using Demand-based Resource Allocation
(ACM Sigmetrics '13 Proceedings of the ACM, 2013-06) Gulati, Ajay; Varman, Peter J.
Current public cloud offerings sell capacity in the form of pre-defined virtual machine (VM) configurations to their tenants. Typically this means that tenants must purchase individual VM configurations based on the peak demands of the applications, or be restricted to only scale-out applications that can share a pool of VMs. This diminishes the value proposition of moving to a public cloud as compared to server consolidation in a private virtualized datacenter, where one gets the benefits of statistical multiplexing between VMs belonging to the same or different applications. Ideally one would like to enable a cloud tenant to buy capacity in bulk and benefit from statistical multiplexing among its workloads. This requires the purchased capacity to be dynamically and transparently allocated among the tenant's VMs that may be running on different servers, even across datacenters. In this paper, we propose two novel algorithms called BPX and DBS that are able to provide the cloud customer with the abstraction of buying bulk capacity. These algorithms dynamically allocate the bulk capacity purchased by a customer between its VMs based on their individual demands and user-set importance. Our algorithms are highly scalable and are designed to work in a large-scale distributed environment. We implemented a prototype of BPX as part of VMware's management software and showed that BPX is able to closely mimic the behavior of a centralized allocator in a distributed manner.
Design of a graphical input simulation tool for extended queueing network models
(1985) Madala, Sridhar; Sinclair, James B.; Briggs, Faye A.; Varman, Peter J.
In analyzing the performance of computer and digital communication systems, contention for finite capacity resources is often seen to be a dominant factor. Extended Queueing Network (EQN) models are appropriate for modeling suck systems and bave been used with considerable success, EQN models can be solved by exact analysis, by approximate analysis, or by simulation. This thesis is concerned with the design of a high level tool for solving EQN models by simulation that accepts specifications in a natural graphical manner. The primary motivation for this research was to prove the feasibility of providing a versatile tool that is easy to learn and use and is complete with respect to EQN models. Existing software tools for EQN modeling do not take advantage of the fact that a natural way to specify such models is graphical. Our modeling tool, the Graphical Input Simulation Tool (GIST), achieves these objectives 1) by utilizing a transaction-oriented approach as opposed to a language-based approach, 2) by providing two user interfaces, a graphical interface and a textual interface, that permit specification of EQN models at a very high level of abstraction, and 3) by means of a versatile set of modeling abstractions. In terms of modeling capabilities, GIST provides analogs of most abstractions commonly found in other high-level BQN modeling tools and also includes abstractions that have no counterparts in other tools. Ve demonstrate the feasibility and utility of providing a GISTlike tool and conclude that the transaction oriented-approach is also applicable and appropriate for building modeling tools in areas other than EQNs. GIST can also be extended to further research in performance evaluation tools and in performance evaluation in general.
Design Techniques for Robust Analog Signal Acquisition
(2012-10-02) Singal, Vikas; Varman, Peter J.; Massoud, Yehia; Clark, John W., Jr.; Koushanfar, Farinaz
The random demodulator architecture is a compressive sensing based receiver that allows the reconstruction of frequency-sparse signals from measurements acquired at a rate below the signal’s Nyquist rate. This in turn results in tremendous power savings in receivers because of the direct correlation between the power consumption of analog-to-digital converters (ADCs) in communication receivers and the sampling rate at which these ADCs operate. In this thesis, we propose design techniques for a robust and efficient random demodulator. We tackle two critical components that are most critical, the resetting mechanism of the integrator and the random sequence. On the one hand, the resetting mechanism can pose challenges in practical settings that can degrade the performance of the random demodulator. We propose practical approaches to mitigate the effect of resetting and propose resetting schemes that provide robust performance. On the other hand, the random sequence is a central part in the system and the properties of this sequence directly affect the properties of the whole system. We study the performance of the random demodulator under many practical random sequences such as maximal length sequences and Kasami sequences and provide pros and cons of using each in the random demodulator.
Efficient Architectures for Wideband Receivers
(2012-08-29) El Smaili, Sami; Varman, Peter J.; Massoud, Yehia; Clark, John W., Jr.; Koushanfar, Farinaz
Reducing power consumption of radio receivers is becoming more critical with the advancement of biomedical portable and implantable devices due to the stringent power requirements in such applications. Compressive sensing promises to tremendously reduce the power of radio receivers by allowing the reconstruction of sparse signals from measurements acquired at a sub-Nyquist rate. A key component in compressive sensing systems is the random signal which is used to acquire the measurements. Most e orts have been devoted to the design of signals with high randomness but little have been devoted to manipulating the random signal to suite a speci fic application, meet certain specifi cations, or enhance the performance of the system. This thesis tackles compressive sensing systems from this angle. We first propose an architecture that alleviates a critical requirement in compressive sensing: that the random signal should run at the Nyquist rate, which becomes prohibitive as the signal bandwidth increases. We provide theoretical and experimental results that demonstrate the e ectiveness of the proposed architecture. Secondly, we propose a framework for manipulating the random signal in the frequency domain as suitable for speci c applications. We use the framework to develop an architecture for recon gurable ultra wide-band radios.
Efficient Archivable Time Index: A Dynamic Indexing Scheme for Temporal Data
(McGraw Hill, 1994) Verma, Rakesh M.; Varman, Peter J.
We present a practical and asymptotically optimal indexing structure for a versioned timestamped database with step-wise constant data. Three version operations, insertions, updates, and deletes are allowed for the present version, whereas query operations are allowed for any version, present or past. Snapshot and time-range queries can be answered optimally with this structure. As a two-level index, attribute-search and attribute-history queries can be solved in time proportional to the output size plus an additive logarithmic term. The time index uses linear storage; this improves upon previous work which either had logarithmic query overhead time and quadratic space, or linear space and linear query overhead time. The tradeoff is a small increase in the time for version operations from constant to logarithmic. All measures are worst-case. The index has a natural structure for archiving in write-once storage media like optical disks.
An Efficient Multiversion Access Structure
(2007-05) Varman, Peter J.; Verma, Rakesh M.
An efficient multiversion access structure for a transaction-time database is presented. Our method requires optimal storage and query times for several important queries and logarithmic update times. Three version operations}inserts, updates, and deletes}are allowed on the current database, while queries are allowed on any version, present or past. The following query operations are performed in optimal query time: key range search, key history search, and time range view. The key-range query retrieves all records having keys in a specified key range at a specified time; the key history query retrieves all records with a given key in a specified time range; and the time range view query retrieves all records that were current during a specified time interval. Special cases of these queries include the key search query, which retrieves a particular version of a record, and the snapshot query which reconstructs the database at some past time. To the best of our knowledge no previous multiversion access structure simultaneously supports all these query and version operations within these time and space bounds. The bounds on query operations are worst case per operation, while those for storage space and version operations are (worst-case) amortized over a sequence of version operations. Simulation results show that good storage utilization and query performance is obtained.
Energy Accounting and Optimization for Mobile Systems
(2013-09-16) Dong, Mian; Zhong, Lin; Cavallaro, Joseph R.; Varman, Peter J.; Sarkar, Vivek
Energy accounting determines how much a software process contributes to the total system energy consumption. It is the foundation for evaluating software and has been widely used by operating system based energy management. While various energy accounting policies have been tried, there is no known way to evaluate them directly simply because it is hard to track every hardware use by software in a heterogeneous multicore system like modern smartphones and tablets. This work provides the ground truth for energy accounting based on multi-player game theory and offers the first evaluation of existing energy accounting policies, revealing their important flaws. The proposed ground truth is based on Shapley value, a single value solution to multi-player games of which four axiomatic properties are natural and self-evident to energy accounting. This work further provides a utility optimization formulation of energy management and shows, surprisingly, that energy accounting does not matter for existing energy management solutions that control the energy use of a process by giving it an energy budget, or budget based energy management (BEM). This work shows an optimal energy management (OEM) framework can always outperform BEM. While OEM does not require any form of energy accounting, it is related to Shapley value in that both require the system energy consumption for all possible combination of processes under question. This work reports a prototype implementation of both Shapley value-based energy accounting and OEM based scheduling. Using this prototype and smartphone workload, this work experimentally demonstrates how erroneous existing energy accounting policies can be, show that existing BEM solutions are unnecessarily complicated yet underperforming by 20% compared to OEM.
Explicit or Symbolic Translation of Linear Temporal Logic to Automata
(2013-07-24) Rozier, Kristin Yvonne; Vardi, Moshe Y.; Kavraki, Lydia E.; Varman, Peter J.
Formal verification techniques are growing increasingly vital for the development of safety-critical software and hardware in practice. Techniques such as requirements-based design and model checking for system verification have been successfully used to verify systems for air traffic control, airplane separation assurance, autopilots, CPU logic designs, life-support, medical equipment, and other functions that ensure human safety. Formal behavioral specifications written early in the system-design process and communicated across all design phases increase the efficiency, consistency, and quality of the system under development. We argue that to prevent introducing design or verification errors, it is crucial to test specifications for satisfiability. We advocate for the adaptation of a new sanity check via satisfiability checking for property assurance. Our focus here is on specifications expressed in Linear Temporal Logic (LTL). We demonstrate that LTL satisfiability checking reduces to model checking and satisfiability checking for the specification, its complement, and a conjunction of all properties should be performed as a first step to LTL model checking. We report on an experimental investigation of LTL satisfiability checking. We introduce a large set of rigorous benchmarks to enable objective evaluation of LTL-to-automaton algorithms in terms of scalability, performance, correctness, and size of the automata produced. For explicit model checking, we use the Spin model checker; we tested all LTL-to-explicit automaton translation tools that were publicly available when we conducted our study. For symbolic model checking, we use CadenceSMV, NuSMV, and SAL-SMC for both LTL-to-symbolic automaton translation and to perform the satisfiability check. Our experiments result in two major findings. First, scalability, correctness, and other debilitating performance issues afflict most LTL translation tools. Second, for LTL satisfiability checking, the symbolic approach is clearly superior to the explicit approach. Ironically, the explicit approach to LTL-to-automata had been heavily studied while only one algorithm existed for LTL-to-symbolic automata. Since 1994, there had been essentially no new progress in encoding symbolic automata for BDD-based analysis. Therefore, we introduce a set of 30 symbolic automata encodings. The set consists of novel combinations of existing constructs, such as different LTL formula normal forms, with a novel transition-labeled symbolic automaton form, a new way to encode transitions, and new BDD variable orders based on algorithms for tree decomposition of graphs. An extensive set of experiments demonstrates that these encodings translate to significant, sometimes exponential, improvement over the current standard encoding for symbolic LTL satisfiability checking. Building upon these ideas, we return to the explicit automata domain and focus on the most common type of specifications used in industrial practice: safety properties. We show that we can exploit the inherent determinism of safety properties to create a set of 26 explicit automata encodings comprised of novel aspects including: state numbers versus state labels versus a state look-up table, finite versus infinite acceptance conditions, forward-looking versus backward-looking transition encodings, assignment-based versus BDD-based alphabet representation, state and transition minimization, edge abbreviation, trap-state elimination, and determinization either on-the-fly or up-front using the subset construction. We conduct an extensive experimental evaluation and identify an encoding that offers the best performance in explicit LTL model checking time and is constantly faster than the previous best explicit automaton encoding algorithm.
High Performance Reliable Variable Latency Carry Select Addition
(2012) Du, Kai; Varman, Peter J.
This thesis describes the design and the optimization of a low overhead, high performance variable latency carry select adder. Previous researchers believed that the traditional adder has reached the theoretical speed bound. However, a considerable portion of hardware resources of the traditional adder is only used in the worst case. Based on this observation, variable latency adders have been proposed to improve on the theoretical limit, but such adders incur significant area overhead. By combining previous variable latency adders with carry select addition, this work describes a novel variable latency carry select adder. Applying carry select addition in the variable latency adder design significantly reduces the area overhead and increases its performance. This variable latency adder is faster and smaller than previous variable latency adders. Furthermore, this variable latency adder can be optimized to be faster and smaller than the fastest adder generated by the Synopsys DesignWare building block IP.
Markov analysis of multiple-disk prefetching strategies for external merging
(Elsevier Science Publishers Ltd. Essex, UK, 1994-06-06) Sadananda Pai, Vinay; Schaffer, Alejandro A.; Varman, Peter J.
Multiple-disk organizations can be used to improve the I/O performance of problems like external merging. Concurrency can be introduced by overlapping I/O requests at different disks and by prefetching additional blocks on each I/O operation. To support this prefetching, a memory cache is required. Markov models for two prefetching strategies are developed and analyzed. Closed-form expressions for the average parallelism obtainable for a given cache size and number of disks are derived for both prefetching strategies. These analytic results are confirmed by simulation.
Module assignment in distributed systems
(1984) Lu, Mi; Sinclair, James B.; Varman, Peter J.; Jump, J. Robert
The problem of finding an optimal assignment of a modular program for n processors in a distributed system is studied. We characterize the distributed programs by Stone's graph model and attempt to find an assignment of modules to processors which minimizes the sum of module execution costs and intermodule communication costs. The problem is NP-complete for more than three processors. We first show how to identify all modules which must be assigned to a particular processor under any optimal assignment. This usually results in a significant reduction in the complexity of the optimal assignment problem. We also present a heuristic algorithm for finding assignments and experimentally verify that it almost always finds an optimal assignment.

Browsing by Author "Varman, Peter J."

Results Per Page

Sort Options