Computer Science Technical Reports

Permanent URI for this collection


Recent Submissions

Now showing 1 - 20 of 245
  • Item
    How Much Do Unstated Problem Constraints Limit Deep Robotic Reinforcement Learning?
    (2019) Lewis, W. Cannon II; Moll, Mark; Kavraki, Lydia E.
    Deep Reinforcement Learning is a promising paradigm for robotic control which has been shown to be capable of learning policies for high-dimensional, continuous control of unmodeled systems. However, Robotic Reinforcement Learning currently lacks clearly defined benchmark tasks, which makes it difficult for researchers to reproduce and compare against prior work. “Reacher” tasks, which are fundamental to robotic manipulation, are commonly used as benchmarks, but the lack of a formal specification elides details that are crucial to replication. In this paper we present a novel empirical analysis which shows that the unstated spatial constraints in commonly used implementations of Reacher tasks make it dramatically easier to learn a successful control policy with Deep Deterministic Policy Gradients (DDPG), a state-of-the-art Deep RL algorithm. Our analysis suggests that less constrained Reacher tasks are significantly more difficult to learn, and hence that existing de facto benchmarks are not representative of the difficulty of general robotic manipulation.
  • Item
    Runtime Support for Distributed Sharing in Strongly-Typed Languages
    (1999-11-13) Cox, Alan L.; Hu, Y. Charlie; Wallach, Dan S.; Yu, Weimin; Zwaenepoel, Willy
    In this paper, we present a new run-time system for strongly-typed programming languages that supports object sharing in a distributed system. The key insight in this system is that type information allows efficient and transparent sharing of data with both fine-grained and coarse-grained access patterns. In contrast, conventional distributed shared memory (DSM)systems that support sharing of an untyped memory region are limited to providing only one granularity with good performance. This new run-time system, SkidMarks, provides a shared object space abstraction rather than a shared address space abstraction. Three key aspects of the design are: First, SkidMarks uses type information, in particular, the ability to unambiguously recognize references, to make fine-grained sharing efficient by supporting object granularity coherence. Second, SkidMarks aggregates the communication of objects, making coarse-grained sharing efficient. Third, SkidMarks uses a globally unique "handle'' rather than a virtual address to name an object, enabling each machine to allocate storage just for the objects that it accesses, improving spatial locality. We compare SkidMarks to TreadMarks, a conventional DSM system that is efficient at handling coarse-grained sharing. Our performance evaluation substantiates the following claims: The performance of coarse-grained applications is nearly as good as in TreadMarks (within 6%). Since the performance of such applications is already good in TreadMarks, we consider this an acceptable performance penalty. The performance of fine-grained applications is considerably (up to 98% for Barnes-Hut and 62% for Water-Spatial) better than in TreadMarks. The performance of garbage-collected applications is considerably (up to 150%) better than in TreadMarks.
  • Item
    Cache Management in Scalable Network Servers
    (2000-07-13) Pai, Vivek
    For many users, the perceived speed of computing is increasingly dependent on the performance of network server systems, underscoring the need for high performance servers. Cost-effective scalable network servers can be built on clusters of commodity components (PCs and LANs) instead of using expensive multiprocessor systems. However, network servers cache files to reduce disk access, and the cluster's physically disjoint memories complicate sharing cached file data. Additionally, the physically disjoint CPUs complicate the problem of load balancing. This work examines the issue of cache management in scalable network servers at two levels—per-node (local) and cluster-wide (global). Per-node cache management is addressed by the IO-Lite unified buffering and caching system. Applications and various parts of the operating system currently use incompatible buffering schemes, resulting in unnecessary data copying. For network servers, overall throughput drops for two reasons—copying wastes CPU cycles, and multiple copies of data compete with the filesystem cache for memory. IO-Lite allows applications, the operating system, file system, and network code to safely and securely share a single copy of data. The cluster-wide solution uses a technique called Locality-Aware Request Distribution (LARD) that examines the content of incoming requests to determine which node in a cluster should handle the request. LARD uses the request content to dynamically partition the incoming request stream. This partitioning increases the file cache hit rates on the individual nodes, and it maintains load balance in the cluster.
  • Item
    A Characterization of Compound Documents on the Web
    (1999-11-29) Lara, Eyal de; Wallach, Dan S.; Zwaenepoel, Willy
    Recent developments in office productivity suites make it easier for users to publish rich {\em compound documents\/} on the Web. Compound documents appear as a single unit of information but may contain data generated by different applications, such as text, images, and spreadsheets. Given the popularity enjoyed by these office suites and the pervasiveness of the Web as a publication medium, we expect that in the near future these compound documents will become an increasing proportion of the Web's content. As a result, the content handled by servers, proxies, and browsers may change considerably from what is currently observed. Furthermore, these compound documents are currently treated as opaque byte streams, but future Web infrastructure may wish to understand their internal structure to provide higher-quality service. In order to guide the design of this future Web infrastructure, we characterize compound documents currently found on the Web. Previous studies of Web content either ignored these document types altogether or did not consider their internal structure. We study compound documents originated by the three most popular applications from the Microsoft Office suite: Word, Excel, and PowerPoint. Our study encompasses over 12,500 documents retrieved from 935different Web sites. Our main conclusions are: Compound documents are in general much larger than current HTML documents. For large documents, embedded objects and images make up a large part of the documents' size. For small documents, XML format produces much larger documents than OLE. For large documents, there is little difference. Compression considerably reduces the size of documents in both formats.
  • Item
    Programming Languages for Reusable Software Components
    (1999-07-20) Flatt, Matthew
    Programming languages offer a variety of constructs to support code reuse. For example, functional languages provide function constructs for encapsulating expressions to be used in multiple contexts. Similarly, object-oriented languages provide class (or class-like) constructs for encapsulating sets of definitions that are easily adapted for new programs. Despite the variety and abundance of such programming constructs, however, existing languages are ill-equipped to support  Component programming with reusable software components. Component programming differs from other forms of reuse in its emphasis on the independent development and deployment of software components. In its ideal form, component programming means building programs from off-the-shelf components that are supplied by a software-components industry. This model suggests a strict separation between the producer and consumer of a component. The separation, in turn, implies separate compilation for components, allowing a producer to test and distribute compiled components rather than proprietary source code. Since the consumer cannot modify a compiled software component, each component must be defined and compiled in a way that gives the consumer flexibility in linking components together. This dissertation shows how a language for component programming can support both separate compilation and flexible linking. To that end, it expounds the Principle of external connections. A language should separate component definitions from component connections. Neither conventional module constructs nor conventional object-oriented constructs follow the principle of external connections, which explains why neither provides an effective language for component programming. We describe new language constructs for modules and classes—called units and mixins, respectively—that enable component programming in each domain. The unit and mix in constructs modeled in this dissertation are based on constructs that we implemented for the MzScheme programming language, a dialect of the dynamically-typed language Scheme. To demonstrate that units and mixins work equally well for statically-typed languages, such as ML or Java, we provide typed models of the constructs as well as untyped models, and we formally prove the soundness of the typed models.
  • Item
    Transformations and Transitions from the Sylvester to the Bezout Resultant
    (1999-06-17) Chionh, Eng-Wee; Goldman, Ronald; Zhang, Ming
    A simple matrix transformation linking the resultant matrices of Sylvester and Bezout is derived. This transformation matrix is then applied to generate an explicit formula for each entry of the Bezout resultant, and this entry formula is used, in turn, to construct an efficient recursive algorithm for computing all the entries of the Bezout matrix. Hybrid resultant matrices consisting of some columns from the Sylvester matrix and some columns from the Bezout matrix provide natural transitions from the Sylvester to the Bezout resultant, and allow as well the Bezout construction to be generalized to two polynomials of different degrees. Such hybrid resultants are derived here, employing again the transformation matrix from the Sylvester to the Bezout resultant.
  • Item
    New Approaches to Routing for Large-Scale Data Networks
    (1999-06-21) Chen, Johnny
    This thesis develops new routing methods for large-scale, packet-switched data networks such as the Internet. The methods developed increase network performance by considering routing approaches that take advantage of more available network resources than do current methods. Two approaches are explored: dynamic metric and multipath routing. Dynamic metric routing provides paths that change dynamically in response to network traffic and congestion, thereby increasing network performance because data travel less congested paths. The second approach, multipath routing, provides multiple paths between nodes and allows nodes to use these paths to best increase their network performance. Nodes in this environment achieve increased performance through aggregating the resources of multiple paths. This thesis implements and analyzes algorithms for these two routing approaches. The first approach develops hybrid-Scout, a dynamic metric routing algorithm that calculates independent and selective dynamic metric paths. These two calculation properties are key to reducing routing costs and avoiding routing instabilities, two difficulties commonly experienced in traditional dynamic metric routing. For the second approach, multipath routing, this thesis develops a complete multipath network that includes the following components: routing algorithms that compute multiple paths, a multipath forwarding method to ensure that data travel their specified paths, and an end-host protocol that effectively uses multiple paths. Simulations of these two routing approaches and their components demonstrate significant improvement over traditional routing strategies. The hybrid-Scout algorithm requires 3-4 times to 1-2orders of magnitude less routing cost compared to traditional dynamic metric routing algorithms while delivering comparable network performance. For multipath routing, nodes using the multipath protocol fully exploit the offered paths and increase performance linearly in the additional resources provided by the multipath network. The performance improvements validate the multipath routing algorithms and the effectiveness of the proposed end-host protocol. Furthermore, this new multipath forwarding method allows multipath networks to be supported at low routing costs. This thesis demonstrates that the proposed methods to implement dynamic metric and multipath routing are efficient and deliver significant performance improvements.
  • Item
    The Block Structure of Three Dixon Resultants and Their Accompanying Transformation Matrices
    (1999-06-16) Chionh, Eng-Wee; Goldman, Ronald; Zhang, Ming
    Dixon [1908] introduces three distinct determinant formulations for the resultant of three bivariate polynomials of bidegree (m,n) . The first technique applies Sylvester's dialytic method to construct the resultant as the determinant of a matrix of order 6mn . The second approach uses Cayley's determinant device to form a more compact representation for the resultant as the determinant of a matrix of order 2mn . The third method employs a combination of Cayley's determinant device with Sylvester's dialytic method to build the resultant as the determinant of a matrix of order 3mn . Here relations between these three resultant formulations are derived and the structure of the transformations between these resultant matrices is investigated. In particular, it is shown that these transformation matrices all have similar, simple, upper triangular, block symmetric structures and the blocks themselves have elegant symmetry properties. Elementary entry formulas for the transformation matrices are also provided. In light of these results, the three Dixon resultant matrices are reexamined and shown to have natural block structures compatible with the block structures of the transformation matrices. These block structures are analyzed here and applied along with the block structures of the transformation matrices to simplify the calculation of the entries of the Dixon resultants of order 2mn and 3mn and to make these calculations more efficient by removing redundant computations.
  • Item
    A Set of Convolution Identities Relating the Blocks of Two Dixon Resultant Matrices
    (1999-06-16) Chionh, Eng-Wee; Goldman, Ronald; Zhang, Ming
    Resultants for bivariate polynomials are often represented by the determinants of very big matrices. Properly grouping the entries of these matrices into blocks is a very effective tool for studying the properties of these resultants. Here we derive a set of convolution identities relating the blocks of two Dixon bivariate resultant representations.
  • Item
    TCP Implementation Enhancements for Improving Webserver Performance
    (1999-07-06) Aron, Mohit; Druschel, Peter
    This paper studies the performance of BSD-based TCP implementations in Web servers. We find that lack of scalability with respect to high TCP connection rates reduces the throughput of Web servers by up to 25% and imposes a memory overhead of up to 32 MB on the kernel. We also find that insufficient accuracy in TCP's timers results in overly conservative delays for retransmission timeouts, causing poor response time, low network utilization and throughput loss. The paper proposes enhancements to the TCP implementation that eliminate these problems, without requiring changes to the protocol or the API. We also find that conventional benchmark environments do not fully expose certain significant performance aspects of TCP implementations and propose techniques that allow these benchmarks to more accurately predict the performance of real servers.
  • Item
    A Deterministic Model for Parallel Program Performance Evaluation
    (1998-12-03) Adve, Vikram S.; Vernon, Mary K.
    Analytical models for parallel programs have been successful at providing simple qualitative insights and bounds on scalability, but have been less successful in practice for predicting detailed, quantitative information about program performance. We develop a conceptually simple model that provides detailed performance prediction for parallel programs with arbitrary task graphs, a wide variety of task scheduling policies, shared-memory communication, and significant resource contention. Unlike many previous models, our model assumes deterministic task execution times which permits detailed analysis of synchronization, task scheduling, the order of task execution as well as mean values of communication costs. The assumption of deterministic task times is supported by a recent study of the influence of non-deterministic delays in parallel programs. We show that the deterministic task graph model is accurate and efficient for five shared-memory programs, including programs with large and/or complex task graphs, sophisticated task scheduling, highly non-uniform task times, and significant communication and resource contention. We also use three example programs to illustrate the predictive capabilities of the model. In two cases, broad insights and detailed metrics from the model are used to suggest improvements in load-balancing and the model quickly and accurately predicts the impact of these changes. In the third case, further novel metrics are used to obtain insight into the impact of program design changes that improve communication locality as well as load-balancing. Finally, we briefly present results of a comparison between our model and representative models based on stochastic task execution times.
  • Item
    Improving Memory Hierarchy Performance for Irregular Applications
    (1999-03-10) Kennedy, Ken; Mellor-Crummey, John; Whalley, David
    The gap between CPU speed and memory speed in modern computer systems is widening as new generations of hardware are introduced. Loop blocking and prefetching transformations help bridge this gap for regular applications; however, these techniques don't deal well with irregular applications. This paper investigates using data and computation reordering strategies to improve memory hierarchy utilization for irregular applications on systems with multi-level memory hierarchies. We introduce multi-level blocking as a new computation reordering strategy and present novel integrations of computation and data reordering using space-filling curves. In experiments that applied a combination of data and computation reorderings to two irregular programs, overall execution time dropped by about a factor of two.
  • Item
    Operating system support for server applications
    (1999-05-25) Banga, Gaurav
    General-purpose operating systems provide inadequate support for large-scale servers. Server applications lack sufficient control over scheduling and management of machine resources, which makes it difficult to enforce priority policies, and to provide robust and controlled service. For example, server applications cannot provide differentiated quality of service to requests from different clients. The root cause of these problems is a fundamental mismatch between the original design assumptions underlying the resource management mechanisms of current general-purpose operating systems, and the behavior of modern server applications. In particular, the notions of protection domain and resource principal coincide in the process abstraction of current operating systems. Moreover, these operating systems provide insufficient control to an application over the resources that are consumed inside the kernel on behalf of the application. These aspects of current operating systems prevent a server process that manages large numbers of network connections, for example, from properly allocating system resources among those connections. This dissertation addresses the lack of operating system support for fine-grained resource management in large-scale server systems. It starts by characterizing the nature of the mismatch between the design assumptions of current general-purpose operating systems, and the behavior of server applications. The traditional design of core operating system abstractions and APIs is reevaluated in the light of the requirements of server applications. This reevaluation leads to a set of novel operating system abstractions and APIs that serve to provide effective support for server applications.
  • Item
    Bisimulation Minimization in an Automata-Theoretic Verification Framework
    (1998-10-27) Fisler, Kathi; Vardi, Moshe Y.
    Bisimulation is a seemingly attractive state-space minimization technique because it can be computed automatically and yields the smallest model preserving all mu -calculus formulas. It is considered impractical for symbolic model checking, however, because the required BDDs are prohibitively large for most designs. We revisit bisimulation minimization, this time in an automata-theoretic framework. Bisimulation has potential in this framework because after intersecting the design with the negation of the property, minimization can ignore most of the atomic propositions. We compute bisimulation using an algorithm due to Lee and Yannakakis that represents bisimulation relations by their equivalence classes and only explores reachable classes. This greatly improves on the time and memory usage of naive algorithms. We demonstrate that bisimulation is practical for many designs within the automata-theoretic framework. In most cases, however, the cost of performing this reduction still outweighs that of conventional model checking.
  • Item
    An Experimental Evaluation of List Scheduling
    (1998-09-30) Cooper, Keith D.; Schielke, Philip; Subramanian, Devika
    While altering the scope of instruction scheduling has a rich heritage in compiler literature, instruction scheduling algorithms have received little coverage in recent times. The widely held belief is that greedy heuristic techniques such as list scheduling are "good" enough for most practical purposes. The evidence supporting this belief is largely anecdotal with a few exceptions. In this paper we examine some hard evidence in support of list scheduling. To this end we present two alternative algorithms to list scheduling that use randomization: randomized backward forward list scheduling, and iterative repair. Using these alternative algorithms we are better able to examine the conditions under which list scheduling performs well and poorly. Specifically, we explore the efficacy of list scheduling in light of available parallelism, the list scheduling priority heuristic, and number of functional units. While the generic list scheduling algorithm does indeed perform quite well overall, there are important situations which may warrant the use of alternate algorithms.
  • Item
    Mathematical Properties of Variational Subdivision Schemes
    (1998-09-24) Warren, Joe
    Subdivision schemes for variational splines were introduced in a previous paper. This technical report focuses on discussing the mathematical properties of these subdivision schemes in more detail. Please read the original paper before reading this analysis.
  • Item
    A Linear Transform Scheme for Combining Weights into Scores
    (1998-10-09) Sung, Sam
    Ranking has been widely used in many applications. A ranking scheme usually employs a "scoring rule" that assigns a final numerical value to each and every object to be ranked. A scoring rule normally involves the use of one or many scores, and it gives more weight to the scores that is more important. In this paper, we give a scheme that can combine weights into scores in a natural way. We compare our scheme to the formula given by Fagin. We give additional desirable properties that weighted "scoring rule" are desirable to possess. Some interesting issues on weighted scoring rule are also discussed.
  • Item
    Issues in Instruction Scheduling
    (1998-09-15) Schielke, Philip
    Instruction scheduling is a code reordering transformation that attempts to hide latencies present in modern day microprocessors. Current applications of these microprocessors and the microprocessors themselves present new parameters under which the scheduler must operate. For example, some multiple functional unit processors have partitioned register sets. In some applications, increasing the static size of a program may not be an acceptable tradeoff for gaining improved running time. The interaction between the scheduler and the register allocator can also dramatically affect the performance of the compiled code. In this work we will look at global scheduling techniques that do not replicate code, including scheduling overextended basic blocks. We also look at a replacement to the traditional list scheduler based on the techniques of iterative repair. Finally, we explore the interaction between instruction scheduling and register allocation, and look at ways of combining the two problems.
  • Item
    A Simple, Practical Distributed Multi-Path Routing Algorithm
    (1998-07-16) Chen, Johnny; Druschel, Peter; Subramanian, Devika
    We present a simple and practical distributed routing algorithm based on backward learning. The algorithm periodically floods \emscout packets that explore paths to a destination in reverse. Scout packets are small and of fixed size; therefore, they lend themselves to hop-by-hop piggy-backing on data packets, largely defraying their cost to the network. The correctness of the proposed algorithm is analytically verified. Our algorithm also has loop-free multi-path routing capabilities, providing increased network utilization and route stability. The Scout algorithm requires very little state and computation in the routers, and can efficiently and gracefully handle high rates of change in the network's topology and link costs. An extensive simulation study shows that the proposed algorithm is competitive with link-state and distance vector algorithms, particularly in highly dynamic networks.
  • Item
    A New Approach to Routing With Dynamic Metrics
    (1998-11-18) Chen, Johnny; Druschel, Peter; Subramanian, Devika
    We present a new routing algorithm to compute paths within a network using dynamic link metrics. Dynamic link metrics are cost metrics that depend on a link's dynamic characteristics, e.g., the congestion on the link. Our algorithm is destination-initiated: the destination initiates a global path computation to itself using dynamic link metrics. All other destinations that do not initiate this dynamic metric computation use paths that are calculated and maintained by a traditional routing algorithm using static link metrics. Analysis of Internet packet traces show that a high percentage of network traffic is destined for a small number of networks. Because our algorithm is destination-initiated, it achieves maximum performance at minimum cost when it only recomputes dynamic metric paths to these selected "hot" destination networks. This selective approach to route recomputation reduces many of the problems (principally route oscillations) associated with calculating all routes simultaneously. We compare the routing efficiency and end-to-end performance of our algorithm against those of traditional algorithms using dynamic link metrics. The results of our experiments show that our algorithm can provide higher network performance at a significantly lower routing cost under conditions that arise in real networks. The effectiveness of the algorithm stems from the independent, time-staggered recomputation of important paths using dynamic metrics, allowing for splits in congested traffic that cannot be made by traditional routing algorithms.