Computer Science
Permanent URI for this community
Browse
Browsing Computer Science by Issue Date
Now showing 1 - 20 of 435
Results Per Page
Sort Options
Item Analysis of Synchronization in a Parallel Programming Environment(1990-08) Subhlok, Jaspal S.Parallel programming is an intellectually demanding task. One of the most difficult challenges in the development of parallel programs for asynchronous shared memory systems is avoiding errors caused by inadvertent data sharing, often referred to as data races. Static prediction of data races requires data dependence analysis, as well as analysis of parallelism and synchronization. This thesis addresses synchronization analysis of parallel programs. Synchronization analysis enables accurate prediction of data races in a parallel program. The results of synchronization analysis can also be used in a variety of other ways to enhance the power and flexibility of a parallel programming environment. We introduce the notion of schedule-correctness of a parallel program and relate it to data dependences and execution orders in the program. We develop a framework for reasoning about execution orders and prove that static determination of execution orders in parallel programs with synchronization is NP-hard, even for a simple language. We present two different algorithms for synchronization analysis that determine whether the cumulative effect of synchronization is sufficient to ensure the execution ordering required by data dependence. The first algorithm iteratively computes and propagates ordering information between neighbors in the program flow graph, analogous to data flow analysis algorithms. The second algorithm computes the necessary path information in the program graph and uses it to transform the problem into an integer programming problem, which also is NP-hard. We present a heuristic approach to solving the integer programming problem obtained and argue that it is efficient for the problem cases that we expect to encounter in our analysis. We discuss the merits, shortcomings and suitability of the two algorithms presented. We have developed a prototype implementation of synchronization analysis. We discuss the implementation and practical issues relating to the effectiveness and usefulness of our analysis.Item Combining Particles and Waves for Fluid Animation(1992-04) Hall, MarkModeling fluid motion is a problem largely unsolved by traditional modeling techniques. Animation of fluid motion has been possible only in special cases, falling into one of two general categories. Upper surface representations model wave phenomena for fluid in placid situations, such as calm ocean waves. Particle systems define the chaotic motion of fluid in highly volatile states, like waterfalls. Each technique can mimic physical motion of liquid only in limited situations. We propose a system unifying previous techniques. We describe a system containing fluid of both categories. Two representations for fluid, corresponding to previous methods, allow modeling a wide range of situations. Automatic transitions between representations allow using the most appropriate technique for a given physical situation. Fluid is represented by two types of primitives: drops and pools. The drops constitute a particle system describing small, independent components. Pools model large bodies of fluid in more placid situations. The notion of support differentiates the situations that each representation models best. In general terms, fluid is supported when there is a solid underneath the fluid. Previous techniques generally either assume that support is omnipresent or that support is nonexistent. In our system, support determines which representation should be used and when transitions between the two should occur. Supported drops flatten and become pools. Unsupported pools spawn drops. Combining the two techniques into a single system allows mimicking fluid in a broader range of physical situations than previous methods. The resulting system models fluid motion based on physical properties of the environment. Gravity causes fluid to fall or flow downward. Solids restrict fluid motion, changing the course of flowing fluid and defining the shape of contained fluid.Item Register Allocation via Graph Coloring(1992-04) Briggs, PrestonChaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring—Chaitin's coloring heuristic pessimistically assumes any node of high degree will not be colored and must therefore be spilled. By optimistically assuming that nodes of high degree will receive colors, I often achieve lower spill costs and faster code; my results are never worse. Coloring pairs—The pessimism of Chaitin's coloring heuristic is emphasized when trying to color register pairs. My heuristic handles pairs as a natural consequence of its optimism. Rematerialization—Chaitin et al. introduced the idea of rematerialization to avoid the expense of spilling and reloading certain simple values. By propagating rematerialization information around the SSA graph using a simple variation of Wegman and Zadeck's constant propagation techniques, I discover and isolate a larger class of such simple values. Live range splitting—Chow and Hennessy's technique, priority-based coloring, includes a form of live range splitting. By aggressively splitting live ranges at selected points before coloring, I am able to incorporate live range splitting into the framework of Chaitin's allocator. Additionally, I report the results of experimental studies measuring the effectiveness of each of my improvements. I also report the results of an experiment suggesting that priority-based coloring requires O(n2) time and that the Yorktown allocator requires only O(n log n) time. Finally, I include a chapter describing many implementation details and including further measurements designed to provide an accurate intuition about the time and space requirements of coloring allocators.Item Hierarchical Attribute Grammars: Dialects, Applications and Evaluation Algorithms(1992-05) Carle, AlanAlthough attribute grammars have been applied successfully to the specification of many different phases of analysis and transformation of complex language processing systems, including type checking, data flow analysis, constant propagation and dead code elimination, little success has been achieved in applying attribute grammars to the specification of complete systems such as multiple pass code optimizers or automatic parallelizers. This is the direct result of the failure of typical attribute grammar dialects to provide any means for composing attribute grammar specifications of sub-computations to create a specification of a complete hierarchical computation. This dissertation introduces the notion of hierarchical attribute grammars, a set of attribute grammar dialects that are suitably expressive to permit the natural description of complex computations through the composition of attribute grammar specified sub-computations within the context of the original attribute grammar formalism. The set of hierarchical dialects includes Schulz's attributed transformations, the Attribute Coupled Grammars of Ganzinger and Giegerich, SSL, the specification language of the Synthesizer Generator of Reps and Teitelbaum, the Higher Order Attribute Grammars of Vogt, Swierstra and Kuiper, and a new dialect Modular Attribute Grammars. The relationships between these five dialects are examined, and examples of Modular Attribute Grammar specifications are presented. For hierarchical attribute grammar dialects to be useful, efficient batch and incremental evaluators for hierarchical dialects must be developed. Therefore, the majority of this dissertation is dedicated to the presentation of new batch and incremental evaluation algorithms for hierarchical specifications.Item Surface Approximation By Low Degree Patches With Multiple Representations(1992-08) Lodha, Suresh KumarComputer Aided Geometric Design (CAGD) is concerned with the representation and approximation of curves and surfaces when these objects have to be processed by a computer. Parametric representations are very popular because they allow considerable flexibility for shaping and design. Implicit representations are convenient for determining whether a point is inside, outside or on the surface. These representations offer many complimentary advantages. Therefore, it is desirable to build geometric models with surfaces which have both parametric and implicit representations. Maintaining the degree of the surfaces low is important for practical reasons. Both the size of the surface representation, as well as the difficulties encountered in the algorithms, e.g. root finding algorithms, grow quickly with increasing degree. This thesis introduces low degree surfaces with both parametric and implicit representations and investigates their properties. A new method is described for creating quadratic triangular Bezier surface patches which lie on implicit quadric surfaces. Another method is described for creating biquadratic tensor product Bezier surface patches which lie on implicit cubic surfaces. The resulting surface patches satisfy all of the standard properties of parametric Bezier surfaces, including interpolation of the corners of the control polyhedron and the convex hull property. The second half of this work describes a scheme for filling n-sided holes and for approximating the resulting smooth surface consisting of high degree parametric Bezier surface patches by a continuous surface consisting of low degree patches with both parametric and implicit representations. A new technique is described for filling an n-sided hole smoothly using a single parametric surface patch with a geometrically intuitive compact representation. Next, a new degree reduction algorithm is applied to approximate high degree parametric Bezier surfaces by low degree Bezier surfaces. Finally, a variant of the least squares technique is used to approximate parametric Bezier surfaces of low degree by low degree surfaces with both parametric and implicit representations. The resulting surfaces have boundary continuity and approximation properties.Item Soft Typing: An Approach to Type Checking for Dynamically Typed Languages(1992-08) Fagan, MikeIn an effort to avoid improper use of program functions, modern programming languages employ some kind of preventative type system. These type systems can be classified as either static or dynamic. Static type systems detect "ill-typed" program phrases at compile-time, whereas dynamic type systems detect "ill-typed" phrases at run-time. Static typing systems have two important advantages over dynamically typed systems: First, they provide important feedback to the programmer by detecting a large class of program errors before execution. Second. they extract information that a compiler can exploit to produce more efficient code. The price paid for these advantages, however, is a loss of expressiveness and modularity. It is easy to prove that a static type system for an "interesting" programming language necessarily excludes some "good" programs. This paper focuses on the problem of designing programming systems that retain all the expressiveness of dynamic typing, but still offer the early error detection and improved optimization opportunities of static typing. To that end, we introduce a concept called soft typing. The key concept of soft typing is that a type checker need not reject programs containing statically "ill-typed" phrases. Instead, the soft type checker inserts explicit run-time checks. Thus, there are two issues to be addressed in the design of soft typing systems. First, the typing mechanism must provide reasonable feedback to programmers accustomed to dynamically typed languages. Current static systems fail to satisfy the programmer's intuition about correctness on many programs. Second, a soft typing system must sensibly insert run-time checks (when necessary). This paper develops a type system and checking algorithms that are suitable for soft typing a significant class of programming languages.Item Memory-Hierarchy Management(1992-09) Carr, SteveThe trend in high-performance microprocessor design is toward increasing computational power on the chip. Microprocessors can now process dramatically more data per machine cycle than previous models. Unfortunately, memory speeds have not kept pace.The result is an imbalance between computation speed and memory speed. This imbalance is leading machine designers to use more complicated memory hierarchies. In turn, programmers are explicitly restructuring codes to perform well on particular memory systems, leading to machine-specific programs. It is our belief that machine-specific programming is a step in the wrong direction. Compilers, not programmers, should handle machine-specific implementation details. To this end, this thesis develops and experiments with compiler algorithms that manage the memory hierarchy of a machine for floating-point intensive numerical codes. Specifically, we address the following issues: Scalar replacement. Lack of information concerning the flow of array values in standard data-flow analysis prevents the capturing of array reuse in registers. We develop and experiment with a technique to perform scalar replacement in the presence of conditional-control flow to expose array reuse to standard data-flow algorithms. Unroll-and-jam. Many loops require more data per cycle than can be processed by the target machine. We present and experiment with an automatic technique to apply unroll-and-jam to such loops to reduce their memory requirements. Loop Interchange. Cache locality in programs run on advanced microprocessors is critical to performance. We develop and experiment with a technique to order loops within a nest to attain good cache locality. Blocking. Iteration-space blocking is a technique used to attain temporal locality within cache. Although it has been applied to "simple" kernels, there has been no investigation into its applicability over a range of algorithmic styles. We show how to apply blocking to loops with trapezoidal-, rhomboidal-, and triangular-shaped iteration spaces. In addition, we show how to overcome certain complex dependence patterns. Experiments with the above techniques have shown that integer-factor speedups on a single chip are possible. These results reveal that many numerical algorithms can be expressed in a natural, machine-independent form while retaining good memory performance through the use of compiler optimizations.Item An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines(1993-01) Tseng, Chau-WenMassively parallel MIMD distributed-memory machines can provide enormous computational power; however, the difficulty of developing parallel programs for these machines has limited their use. Our thesis is that an advanced compiler can generate efficient parallel programs, if data decompositions are provided. To validate this thesis, we have implemented a compiler for Fortran D, a version of Fortran that provides data decomposition specifications at two levels: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution functions. The Fortran D compiler is organized around three major functions: program analysis, program optimization, and code generation. Its compilation strategy is based on the "owner computes" rule, where each processor only computes values of data it owns. Data decomposition specifications are translated into mathematical distribution functions that determine the ownership of local data. By composing these with subscript functions or their inverses, the compiler can efficiently partition computation and determine nonlocal accesses at compile-time. Fortran D optimizations are guided by the concept of data dependence. Program transformations modify the program execution order to enable optimizations. Communication optimizations reduce the number of messages and overlap communication with computation. Parallelism optimizations detect reductions and optimize pipelined computations to increase the amount of useful computation that may be performed in parallel. Empirical evaluations show that exploiting parallelism is vital, while message vectorization, coarse-grain pipelining, and collective communication are the key communication optimizations. A simple model is constructed to guide compiler optimizations. Loop indices, bounds, and nonlocal storage are managed by the compiler during code generation. Interprocedural analysis, optimization, and code generation algorithms limit compilation to only one pass over each procedure by collecting summary information after edits, then compiling procedures in reverse topological order to propagate necessary information. Delaying instantiation of the work partition, communication, and dynamic data decomposition enables interprocedural optimization. Interactions between the compiler and other elements of the programming system are discussed. Empirical measurements show that the output of the prototype Fortran D compiler is comparable to hand-written codes on the Intel iPSC/860 and significantly outperforms the CM Fortran compiler on the Thinking Machines CM-5.Item Polymorphism for Imperative Languages without Imperative Types(1993-02-18) Wright, AndrewThe simple and elegant Hindley/Milner polymorphic type discipline is the basis of the type system of Standard ML, but ML's imperative features are a blight on this otherwise clean landscape. Polymorphism and imperative features cannot freely coexist without compromising type safety, hence Standard MLassigns imperative types of limited polymorphism to procedures that use references, exceptions, or continuations. Several other solutions exist, but all introduce new kinds of types that complicate the type system, contaminate module signatures, and violate abstraction by revealing the pure or imperative nature of a procedure in its type. We propose a seemingly radical alternative: by restricting polymorphism to values, imperative procedures have the same types as their behaviorally equivalent functional counterparts. Although the resulting type system does not accept all expressions typable in the purely functional sublanguage, this limitation is seldom encountered in practice. The vast majority of ML code already satisfies the restriction of polymorphism to values, and simple syntactic modifications fix the few non-conforming programs.Item Manetho: Fault Tolerance in Distributed Systems Using Rollback-Recovery and Process Replication(1993-10) Elnozahy, ElmootazbellahThis dissertation presents a new protocol that allows rollback-recovery and process replication to co-exist in a distributed system. The protocol relies on a novel data structure called the antecedence graph, which tracks the nondeterministic events during failure-free operation and provides information for recreating them if a failure occurs. The rollback-recovery part of the protocol combines the low failure-free overhead of optimistic rollback-recovery with the advantages of pessimistic rollback-recovery, namely fast output commit, limited rollback, and failure-containment. The process replication part of the protocol features anew multicast protocol designed specifically to support process replication. Unlike previous work, the new protocol provides high throughput and low latency in message delivery without relying on the application semantics. The protocol has been implemented in the Manetho prototype. Experience with a number of long-running, compute-intensive parallel applications confirms the performance advantages of the new protocol. The implementation also features several performance optimizations that are applicable to other rollback-recovery and multicast protocols.Item A Practical Soft Type System for Scheme(1993-12-06) Cartwright, Robert; Wright, AndrewSoft type systems provide the benefits of static type checking for dynamically typed languages without rejecting untypable programs. A soft type checker infers types for variables and expressions and inserts explicit run-time checks to transform untypable programs to typable form. We describe a practical soft type system for R4RS Scheme. Our type checker uses a representation for types that is expressive, easy to interpret, and supports efficient type inference. Soft Scheme supports all of R4RS Scheme, including procedures of fixed and variable arity, assignment, continuations, and top-level definitions. Our implementation is available by anonymous FTP.Item Fully Abstract Semantics for Observably Sequential Languages(1994-01) Cartwright, Robert; Curien, Pierre-Louis; Felleisen, MatthiasOne of the major challenges in denotational semantics is the construction of a fully abstract semantics for a higher-order sequential programming language. For the past fifteen years, research on this problem has focused on developing a semantics forPCF, an idealized functional programming language based on the typed lambda calculus. Unlike most practical languages, PCF has no facilities for observing and exploiting the evaluation order of arguments to procedures. Since we believe that these facilities play a crucial role in sequential computation, this paper focuses on a sequential extension of PCF, called SPCF, that includes two classes of control operators: a possibly empty set of error generators and a collection of catch and throw constructs. For each set of error generators, the paper presents a fully abstract semantics for SPCF. If the set of error generators is empty, the semantics interprets all procedures—including catch and throw—as Berry-Curien sequential algorithms. If the language contains error generators, procedures denote {\it manifestly sequential} functions. The manifestly sequential functions form aScott domain that is isomorphic to a domain of decision trees, which is the natural extension of the Berry-Curien domain of sequential algorithms in the presence of errors.Item Efficient Distributed Shared Memory Based on Multi-Protocol Release Consistency(1994-01) Carter, John B.A distributed shared memory (DSM) system allows shared memory parallel programs to be executed on distributed memory multiprocessors. The challenge in building a DSM system is to achieve good performance over a wide range of shared memory programs without requiring extensive modifications to the source code. The performance challenge translates into reducing the amount of communication performed by the DSM system to that performed by an equivalent message passing program. This thesis describes four novel techniques for reducing the communication overhead of DSM, including: (i) the use of software release consistency, (ii) support for multiple consistency protocols, (iii) a multiple writer protocol, and (iv) an update timeout mechanism. Release consistency allows modifications of shared data to be handled via a delayed update queue, which masks network latencies. Providing multiple consistency protocols allows each shared variable to be kept consistent using a protocol well-suited to the way it is accessed. A multiple writer protocol addresses the problem of false sharing by reducing the amount of unnecessary communication performed to keep falsely shared data consistent. The update timeout mechanism reduces the impact of updates to stale data. These techniques have been implemented in the Munin DSM system. The impact of these features is evaluated by comparing the performance of a collection of shared memory programs running under Munin with equivalent message passing and conventional DSM programs. Over half of the shared memory programs achieved at least 95% of the speedup of their message passing equivalents. For the other programs, the performance bottlenecks were removed via minor program modifications. Furthermore, Munin programs achieved from 25% to over 100% higher speedups than equivalent conventional DSM programs when there was a high degree of sharing. The results indicate that DSM can be a viable alternative to message passing if the amount of unnecessary communication is minimized.Item Typed Fusion with Applications to Parallel and Sequential Code Generation(1994-01-01) Kennedy, Ken; McKinley, Kathryn S.Loop fusion is a program transformation that merges multiple loops into one and is an effective optimization both for increasing the granularity of parallel loops and for improving data locality. This paper introduces typed fusion, a formulation of loop fusion which captures the fusion and distribution problems encountered in sequential and parallel program optimization. Typed fusion is more general and applicable than previous work. We present a fast algorithm for a typed fusion on a graph G = (N; E), where nodes represent loops, edges represent dependence constraints between loops and each loop is assigned one of T distinct types. Only nodes of the same type may fuse. Only nodes of the same type may be fused. The asymptotic time bound for this algorithm is O((N + E)T). The fastest previous algorithm considered only one or two types, but was still O(NE) [KM93]. When T > 2 and there is no reason to prefer fusing one type over another, we prove the problem of finding a fusion with the fewest resultant loops to be NP-hard. Using typed fusion, we present fusion and distribution algorithms that improve data locality and a parallel code generation algorithm that incorporates compound transformations. We also give evidence of the effectiveness of this algorithm in practice.Item Automatic and Interactive Parallelization(1994-03) McKinley, KathrynThe goal of this dissertation is to give programmers the ability to achieve high performance by focusing on developing parallel algorithms, rather than on architecture-specific details. The advantages of this approach also include program portability and legibility. To achieve high performance, we provide automatic compilation techniques that tailor parallel algorithms to shared-memory multiprocessors with local caches and a common bus. In particular, the compiler maps complete applications onto the specifics of a machine, exploiting both parallelism and memory. To optimize complete applications, we develop novel, general algorithms to transform loops that contain arbitrary conditional control flow. In addition, we provide new inter procedural transformations which enable optimization across procedure boundaries. These techniques provide the basis for a robust automatic parallelizing algorithm that is applicable to complete programs. The algorithm for automatic parallel code generation takes into consideration the interaction of parallelism and data locality , as well as the overhead of parallelism. The algorithm is based on a simple cost model that accurately predicts cache line reuse from multiple accesses to the same memory location and from consecutive accesses. The optimizer uses this model to I prove data locality. It also uses the model to discover and introduce effective parallelism that complements the benefits of data locality. The optimizer further improves the effectiveness of parallelism by seeking to increase its granularity. Parallelism is introduced only when granularity is sufficient to overcome its associated. costs. The algorithm for parallel code generation is shown to be efficient and several of its component algorithms are proven optimal. The efficacy of the optimizer is illustrated with experimental results. In most cases, it is very effective and either achieves or improves the performance of hand-crafted parallel programs. When performance is not satisfactory, we provide an interactive parallel programming tool which combines compiler analysis and algorithms with human expertise.Item Separators in Graphs with Negative and Multiple Vertex Weights(1994-04) Djidjev, Hristo N.; Gilbert, JohnA separator theorem for a class of graphs asserts that every graph in the class can be divided approximately in half by removing a set of vertices of specified size. Nontrivial separator theorems hold for several classes of graphs, including graphs of bounded genus and chordal graphs. We show that any separator theorem implies various weighted separator theorems. In particular, we show that if the vertices of the graph have real-valued weights, which maybe positive or negative, then the graph can be divided exactly in half according to weight. If k unrelated sets of weights are given, the graph can be divided simultaneously by all sets of weights. These results considerably strengthen earlier results of Gilbert, Lipton, and Tarjan: (1) for k=1 with the weights restricted to be nonnegative, and (2) for k > 1, nonnegative weights, and simultaneous division within a factor of (1 + e) of exactly in half.Item Models of Control and Their Implications for Programming Language Design(1994-04) Sitaram, DoraiThis work uses denotational models to understand and improve the part of a programming language concerned with non-local control operators. These operators let the programmer identify and restore arbitrary control contexts in the program execution path, and thus form a powerful component of many modern programming languages. We use a variety of denotational models to tackle the issues of (1) describing a control language mathematically, and (2) using the model's apparatus to obtain information useful for designing the language. For this, the full abstraction criterion of testing a model against a language is viewed as a feedback loop that suggests language changes. The results from radically different models, for a variety of control manipulation languages uniformly emphasize the need for delimiting control actions. In the case of higher-order control, this takes the form of a systematic handling of control objects. To check the pragmatics of the new control techniques, we present an implementation and many examples where these delimiters and handlers provide elegant solutions.Item Dynamic Multiple Pattern Matching(1994-05) Idury, RamanaPattern matching algorithms are among the most important and practical contributions of theoretical computer science. Pattern matching is used in a wide variety of applications such as text editing, information retrieval, DNA sequencing, and computer vision. Several combinatorial problems arise in pattern matching such as matching in the presence of local errors, scaling, rotation, compression, and multiple patterns. A common feature shared by many solutions to these problems is the notion of preprocessing the patterns and/or texts prior to the actual matching. We study the problem of pattern matching with multiple patterns. The set of patterns is called a "dictionary." Furthermore, the dictionary can be dynamic in the sense that it can change overtime by insertion or deletion of individual patterns. We need to preprocess the dictionary so as to provide efficient searching as well as efficient updates. We first present a solution to the one dimensional version of the problem where the patterns are strings. A salient feature of our solution is a DFA-based searching mechanism similar to the Knuth-Morris-Pratt algorithm. We then use this solution to solve the two dimensional version of the problem where the patterns are restricted to have square shapes. Finally, we solve the general case, where the patterns can have any rectangular shape, by reducing this problem to a range searching problem in computational geometry.Item Interprocedural Symbolic Analysis(1994-05) Havlak, PaulCompiling for efficient execution on advanced computer architectures requires extensive program analysis and transformation. Most compilers limit their analysis to simple phenomena within single procedures, limiting effective optimization of modular codes and making the programmer’s job harder. We present methods for analyzing array side effects and for comparing nonconstant values computed in the same and different procedures. Regular sections, described by rectangular bounds and stride, prove as effective in describing array side effects in LINPACK as more complicated summary techniques. On a set of six programs, regular section analysis of array side effects gives 0 to 39 percent reductions in array dependences at call sites, with 10 to 25 percent increases in analysis time. Symbolic analysis is essential to data dependence testing, array section analysis, and other high-level program manipulations. We give methods for building symbolic expressions from gated single-assignment form and simplifying them arithmetically. On a suite of 33 scientific Fortran programs, symbolic dependence testing yields reductions of 0 to 32 percent in the number of array dependences, as compared with constant propagation alone.The additional time and space requirements appear proportional to the size of the codes analyzed. Interprocedural symbolic methods are essential in enabling array section analysis and other advanced techniques to operate on multiple procedures. Our implementation provides this support while allowing for recompilation analysis to approximate the incrementalism of separate compilation. However, direct improvement of data dependence graphs from interprocedural symbolic facts is rare in the programs studied. Overall, the use of our symbolic techniques in a production compiler is justified by their efficiency, their direct enhancement of dependence testing within procedures, and their indirect improvement of interprocedural dependence testing through array side effect analysis.Item Practical Soft Typing(1994-08) Wright, AndrewSoft typing is an approach to type checking for dynamically typed languages. Like a static type checker, a soft type checker infers syntactic types for identifiers and expressions. But rather than reject programs containing untypable fragments, a soft type checker inserts explicit run-time checks to ensure safe execution. Soft typing was first introduced in an idealized form by Cartwright and Fagan. This thesis investigates the issues involved in designing a practical soft type system. A soft type system for a purely functional, call-by-value language is developed by extending the Hindley-Milner polymorphic type system with recursive types and limited forms of union types. The extension adapts Remy's encoding of record types with sub-typing to union types. The encoding yields more compact types and permits more efficient type inference than Cartwright and Fagan's early technique. Correctness proofs are developed by employing a new syntactic approach to type soundness. As the type inference algorithm yields complex internal types that are difficult for programmers to understand, a more familiar language of presentation types is developed along with translations between internal and presentation types. To address realistic programming languages like Scheme, the soft type system is extended to incorporate assignment, continuations, pattern matching, data definition, records, modules, explicit type annotations, and macros. Imperative features like assignment and continuations are typed by anew, simple method of combining imperative features with Hindley-Milner polymorphism. The thesis shows soft typing to be practical by illustrating a prototype soft type system for Scheme. Type information determined by the prototype is sufficiently precise to provide useful diagnostic aid to programmers and to effectively minimize run-time checking. The type checker typically eliminates 90% of the run-time checks that are necessary for safe execution with dynamic typing. This reduction in runtime checking leads to significant speedup for some bench marks. Through several examples, the thesis shows how prototypes, developed using a purely semantic understanding of types as sets of values, can be transformed into robust maintainable, and efficient programs by rewriting them to accommodate better syntactic type assignment.