Browsing by Author "Thompson, James R."
Now showing 1 - 20 of 36
Results Per Page
Sort Options
Item A Model Based Examination of Aids: Its Causes and Probable Course(1987-03) Thompson, James R.A customary approach to the control of contagious diseases in contemporary America is via medical intervention, either by preventive vaccination or by the use of antibiotics. Historically, sociological control of epidemics has been the more customary method. This has been due, in part, to the fact that vaccines were unknown before the Nineteenth Century and antibiotics before the Twentieth Century.Item A nonparametric regression algorithm for time series forecasting applied to daily maximum urban ozone concentrations(1989) Sanchez, Rolando Pena; Thompson, James R.Using techniques of nonparametric regression, we develop a nonparametric approach in the context of kernel estimation to realize short-term forecastings of time series. This procedure is applied to an OZONE ($O\sb3)$ daily maximum series, whose values were filtered according to the Tukey (biweight) kernel function: $K(x) = {15\over 16}(1 - x\sp2)\sp2 I\sb{(-1,1)}(x)$. Some parametric approaches such as multivariate regression and autoregressive integrated moving average (ARIMA) models (under assumptions of normality, stationarity, invertibility, etc.) are also shown and compared with the nonparametric approach, which is an attractive alternative. Moreover a procedure for the estimation of missing observations in time series, and a method to improve the optimal "bandwidth" selection for the nonparametric regression kernel estimator are proposed.Item A Simulation-based Approach to Study Rare Variant Associations Across the Disease Spectrum(2013-09-16) Banuelos, Rosa; Kimmel, Marek; Leal, Suzanne; Thompson, James R.; Nakhleh, Luay K.Although complete understanding of the mechanisms of rare genetic variants in disease continues to elude us, Next Generation Sequencing (NGS) has facilitated significant gene discoveries across the disease spectrum. However, the cost of NGS hinders its use for identifying rare variants in common diseases that require large samples. To circumvent the need for larger samples, designing efficient sampling studies is crucial in order to detect potential associations. This research therefore evaluates sampling designs for rare variant - quantitative trait association studies and assesses the effect on power that freely available public cohort data can have in the design. Performing simulations and evaluating common and unconventional sampling schemes results in several noteworthy findings. Specifically, the extreme-trait design is the most powerful design for analyzing quantitative traits. This research also shows that sampling more individuals from the extreme of clinical interest does not increase power. Variant filtering has served as a "proof-of-concept" approach for the discovery of disease-causing genes in Mendelian traits and formal statistical methods have been lacking in this area. However, combining variant filtering schemes with existing rare variant association tests is a practical alternative. Thus, this thesis also compares the robustness of six burden-based rare variant association tests for Mendelian traits after a variant filtering step in the presence of genetic heterogeneity and genotyping errors. This research shows that with low locus heterogeneity, these tests are powerful for testing association. With the exception of the weighted sum statistic (WSS), the remaining tests were very conservative in preserving the type I error when the number of affected and unaffected individuals was unequal. The WSS, on the other hand, had inflated type I error as the number of unaffected individuals increased. The framework presented can serve as a catalyst to improve sampling design and to develop robust statistical methods for association testing.Item A stochastic approach to prepayment modeling(1996) Overley, Mark S.; Thompson, James R.A new type of prepayment model for use in the valuation of mortgage-backed securities is presented. The model is based on a simple axiomatic characterization of the prepayment decision by the individual in terms of a continuous time, discrete state stochastic process. One advantage of the stochastic approach compared to a traditional regression model is that information on the variability of prepayments is retained. This information is shown to have a significant effect on the value of mortgage-backed derivative securities. Furthermore, the model explains important path dependent properties of prepayments such as seasoning and burnout in a natural way, which improves fit accuracy for mean prepayment rates. This is demonstrated by comparing the stochastic mean to a nonlinear regression model based on time and mortgage rate information for generic Ginnie Mae collateral.Item A Stochastic Model Providing a Rationale for Adjuvant Chemotherapy(1986-08) Thompson, James R.; Brown,Barry W.A model yielding the probability of curative outcome for a patient at the time of tumor detection is presented. The status of the patient is determined by whether or not metastases (distant spread of the tumor) have occurred and whether any such metastases are drug resistant. If there are no metastases, then local excision is presumed curative; if there are nonresistant metastases, then local excision plus adjuvant chemotherapy is presumed curative; if any metastases are drug resistant, there is no cure. Metastases and drug resistance arise independently with intensities proportional to total tumor size. Over a wide range of such intensities, the addition of adjuvant drug therapy yields a dramatic improvement in the probability of cure.Item An automatic algorithm for the estimation of mode location and numerosity in general multidimensional data(1995) Elliott, Mark Nathan; Thompson, James R.Exploratory data analysis in four or more dimensions present many challenges that are unknown in lower dimensionalities. The emptiness of high dimensional space makes merely locating the regions in which data is concentrated a nontrivial task. A nonparametric algorithm has been developed which determines the number and location of modes in a multidimensional data set. This algorithm appears to be free of the major disadvantages of standard methods. The procedure can be used in data exploration and can also automatically and nonparametrically test for multimodality. The algorithm performs well in several applications. In particular, the algorithm suggests that the Fisher-Anderson iris data, which contains three species, has four modes.Item Comparison between an algorithm for kernel estimation and an algorithm for variable kernel estimation(1978) Chavarria, Silvia; Thompson, James R.This work studies two different algorithms on non-parametric density estimators. The first algorithm, based on kernel density estimators, gives a method for approximating the optimal parameter The second algorithm studied works with variable kernel estimators finding for this estimator optimal parameters. Very good results were obtained with the samples studied, with the first algorithm. With the second one, several problems were found, with the implementation of the proposed algorithm as well as with the algorithm itself.Item Comparison of data-based methods for non-parametric density estimation(1979) Factor, Lynette Ethel; Thompson, James R.; Scott, David W.; Gorry, G. AnthonyThere have been recent developments in data-based methods for estimating densities non-parametrically. In this work we shall compare some methods developed by Scott, Duin and Wahba according to their sensitivity, statistical accuracy and cost of implementation when applied to one-dimensional data sets. We shall illustrate the limitations and tradeoffs of each method. The estimates obtained by each method will also be compared to the maximum likelihood univariate Gaussian estimate. We shall also illustrate the application of Duin's method to two-dimensional data sets and compare the results to the maximum likelihood bivariate Gaussian estimate.Item Design and Validation of Ranking Statistical Families for Momentum-Based Portfolio Selection(2013-07-24) Tooth, Sarah; Thompson, James R.; Dobelman, John A.; Williams, Edward E.In this thesis we will evaluate the effectiveness of using daily return percentiles and power means as momentum indicators for quantitative portfolio selection. The statistical significance of momentum strategies has been well-established, but in this thesis we will select the portfolio size and holding period based on current (2012) trading costs and capital gains tax laws for an individual in the United States to ensure the viability of using these strategies. We conclude that the harmonic mean of daily returns is a superior momentum indicator for portfolio construction over the 1970-2011 backtest period.Item Estimation of the parameters of all-pole sequences corrupted by additive observation noise(1983) McGinn, Darcy; Johnson, Don H.; Thompson, James R.; Parks, Thomas W.Ordinary Least Squares procedures and the equivalent Yule-Walker formulation result in biased estimates of all-pole model parameters when applied to noise corrupted all-pole sequences. This bias is shown to be proportional to the inverse of the signal-to-noise ratio. The algorithm investigated applies an autocorrelation-like operation to the noise corrupted all-pole sequence which increases the signal-to-noise ratio but preserves the pole locations. This operation is applied recursively until acceptable signal-to-noise ratio is obtained. The all-pole parameters are then estimated from the high signal-to-noise ratio sequence using an Ordinary Least Squares estimator. The improvement in signal-to-noise ratio varies for different modes in an allpole sequence with modes corresponding to pole locations close to the unit circle showing the most improvement. A signal-to-noise ratio cutoff exists below which no improvement in signal-to-noise ratio is possible for a given mode. This cutoff is dependent on the radius of the poles of the mode and goes to zero as the pole approaches the unit circle. The signal-to-noise ratio cutoff also corresponds to the point at which the mode’s peak spectral value just equals the level of the noise floor. Estimates of the poles from the high signal-to-noise ratio sequences show reduction in the noise induced bias concomitant with the increased signal-to-noise ratio. Correlations of up to four times are shown to be advantageous. The sensitivity of the successive autocorrelation algorithm to a white observation noise assumption is found to be small. With long correii lation length signals, such as sinusoids, unbiased low variance estimates of the parameters are possible at signal-to-noise ratios of as low as .1.Item Estimation techniques in non-stationary renewal processes(1980) Swami, Ananthram; Johnson, Don H.; Parks, Thomas W.; Thompson, James R.The multiplicative intensity model for the intensity function u(t;N(t);w) = v(t)r(t - of a self-exciting point process is analyzed in terms of the distortion of v(t) by the channel r(x). A convenient and common method of presenting point process data, the Post Stimulus Histogram is shown to be related to the ensemble average of the intensity process and hence incorporates stimulus v() as well as refractory r() related effects. This quantity is not usually amenable to closed-form representation. We propose an approximation to the PST which is reasonably good under specified conditions. A maximum likelihood estimator of r(x), where v(t) is known, is derived. A maximum likelihood estimator of v(t), given r(x), is also derived. This estimator is meaningful only when the signal v(t) is known to be periodic. The M.L. Estimator compensates for relative dead-time effects. We propose an iterative dead-time processor, which operating on the histogram obtained from the M.L. Estimate, partially compensates for absolute dead-time effects. The performance of these estimators is compared with those of other procedures. Applications to spike trains recorded from auditory neurons are discussed.Item Futures prices: Data mining and modeling approaches(2000) Lawera, Martin Lukas; Thompson, James R.We present a series of models capturing the non-stationarities and dependencies in the variance of yields on natural gas futures. Both univariate and multivariate models are explored, based on the ARIMA and Hidden-Markov methodologies. The models capture the effects uncovered through various data mining techniques including seasonality, age and transaction-time effects. Such effect have been previously described in the literature, but never comprehensively captured for the purpose of modeling. In addition, we have investigated the impact of temporal aggregation, by modeling both the daily and the monthly data. The issue of aggregation has not been explored in the current literature that focused on the daily data with uniformly underwhelming results. We have shown that modifications to current models to allow aggregation leads to improvements in performance. This is demonstrated by comparing the proposed models to the models currently used in the financial markets.Item Identifying and Dealing with the Approach of Bears and their Departure(2013-05-29) Affinito, Ricardo; Thompson, James R.; Ensor, Katherine B.; Williams, Edward E.Based on the identification of market dynamics, capital allocation in long positions can be dynamically controlled by means of interrupting an otherwise strictly-long investment strategy allowing for an overall improved risk profile and faster response times during periods of persistent negative market returns. Herein, a portfolio selection methodology updating a reasonably diversified selection of competing S&P 500 constituents within and across various predefined industry groups and which produced above average long-term returns with minimized downside-risk, is proposed. Within the various predefined groups of stocks, Simugram methods are used to model and optimize on the distribution of returns up to and including a horizon of interest. Improvements to previous methods are focused toward calibrating the sampling distribution based on an empirical dataset within the various groups comprising the investor's portfolio, optionally allowing for a varying sampling frequency as dictated by the various group dynamics. By combining within-group optimization alongside with the capability of exiting aggressive long-strategies at seemingly riskier times, focus is on providing more frequent updates on a list of constituents with improved performance in both terms of risk and return.Item Market outperformance by nonparametric, simugram-based portfolio selection(2004) Dobelman, John August; Thompson, James R.; Williams, Edward E.A new portfolio selection system is presented which weights components in a target major market index such that the resulting portfolio consistently outperforms the underlying market index by most any multi-period return measure. This is accomplished by use of the simugram, which gives a simulation-based distribution of outcomes of a stochastic experiment. This distribution is time- or space indexed and presents the whole distribution instead of a few moments. When applied to financial engineering problems, it provides a time-indexed risk profile of positions, which is applied as the objective function in the non-linear optimization of portfolio weights. This technique is in contrast to the mean-variance selection model, which seeks to minimize portfolio variance subject to a target return. The simugram-based selection system maximizes portfolio return subject to a non-linear risk tolerance parameter based on the simugram risk profile of all possible portfolio outcomes. For the SP-100 stock index portfolio in the 33-year study period, using multi-period return measures of annualized return and terminal value, the simugram annualized return is on the order of 3 times that of the market benchmark. And for every $l million the market returned in terminal value over this time, the simugram portfolio returned $45 million.Item Market truths: theory versus empirical simulations(2006) Wojciechowski, William C.; Thompson, James R.Item Marketplace Competition in the Personal Computer Industry*(Wiley, 1992) Bridges, Eileen; Ensor, Katherine B.; Thompson, James R.A decision regarding development and introduction of a potential new product depends, in part, on the intensity of compeitition anticipated in the marketplace. In the case of a technology-based product such as a personal computer (PC), the number of competing products may be very dynamic and consequently uncertain. We address this problem by modeling growth in the number of new PCs as a stochastic counting process, incorporating product entries and exits. We demonstrate how to use the resulting model to forecast competition five years in advance.Item Modeling carcinogenesis in lung cancer: Taking genetic factors and smoking factor into account(2006) Deng, Li; Kimmel, Marek; Thompson, James R.The goal of my thesis is to assess the impacts of cigarette smoking and genetic susceptibility on the onset lung cancer and to compute the age-specific probability of developing lung cancer given risk factor levels. The improvement in predicting the chance of having lung cancer at certain age will enhance physicians' capability to design a sensible screening strategy for early tumor detection in a high-risk population. This is the only way to reduce the mortality rate since no effective treatment or cure is available for advanced lung cancer at this time. The evaluation of the effects of these two risk factors proceeds through parameter estimation in the framework of the two-stage clonal expansion (TSCE) model applied to case-control study data. The TSCE model describes carcinogenesis as transitions from normal cells to slightly abnormal cells and to cancerous cells. Our data analysis indicates that smoking enhances the proliferation rate while both smoking and genetic susceptibility affect initiation and malignancy transformation rates. The data suggests that there might be a mechanism difference in the development of lung cancer for non-smokers and for smokers. Besides predicting survival rates, I rigorously prove the non-identifiability theorem for the TSCE model in the piecewise constant case and derive a new algorithm of calculating the survival function for a 3-stage and 2-path stochastic model. This 3-stage and 2-path model has two new features: it consists of two stages instead of one for abnormal cells, where one stage is more advanced than the other, and it includes two paths connecting normal cells to cancerous cells. The test of the new model on Texas cancer data shows a very good fit. Such efforts in developing models that incorporate new findings will lead to a better understanding of the mechanism of carcinogenesis and eventually to the development of drugs to treat cancer.Item Modeling the potential impact of HIV on the spread of tuberculosis in the United States(1995) West, Ronnie Webster; Thompson, James R.Tuberculosis (TB) was thought to be safely in decline in the United States in the mid 1980's as the number of cases dropped by 74% between 1953 and 1985. A wake-up call was issued in 1986 as an increase in TB incidence which could not be accounted for was reported. This upward trend has continued. At of the end of 1992, the CDC estimated that 39,000 more cases of TB had developed over the previous decade than if the declining trend present in the early 1980's had continued. This turnaround in TB is well correlated with the rise of the HIV epidemic. The severely depressed immune systems associated with HIV make individuals infected with the virus more likely to develop active TB than those who are not infected. Whereas susceptibles to HIV are generally confined to high risk groups such as homosexuals or intravenous drug users, this is not the case with TB. It may be that the development of the HIV epidemic has somehow tipped the balance in favor of a continued rise in TB within the United States. The purpose of this work is to investigate through the use of mathematical models the magnitude and duration of the effect which the HIV epidemic may have on TB. Deterministic and stochastic models are developed which reflect the transmission dynamics of both TB and HIV, and the relative merits of these models are discussed. The deterministic models are then linked together to form a model for the combined spread of both diseases. A numerical study is performed to investigate the influence of certain key parameters. The effect which HIV will have on the general population is found to be dependent on the contact structure between the general population and the HIV risk groups as well as a possible shift in the dynamics associated with TB transmission. The development of a TB epidemic within the HIV risk groups is also considered.Item Nobels for nonsense(2005) Thompson, James R.; Baggett, L. Scott; Wojciechowski, William C.; Williams, Edward E.Item Practical and effective methods of simulation based parameter estimation for multidimensional data(1999) Schwalb, Otto W., III; Thompson, James R.In 1983, Atkinson, Bartoszynski, Brown, and Thompson proposed a method of parameter estimation referred to as "simulation based estimation", or SIMEST. SIMEST is closely related to maximum likelihood, in that both methods deal with parameter estimation in the context of a fully parametric model. With SIMEST, however, the arduous step of obtaining the density function from a set of model axioms is avoided via simulation. In this dissertation, we extend the ideas of SIMEST to the case of multidimensional data. A nearest neighbor based binning scheme is proposed where the observations are divided into bins determined by the "rings" of concentric ellipsoids, the "rings" being chosen to roughly approximate regions of equal probability. The ellipsoids are each allowed to have different axes, the axes for each ellipsoid being determined by the data. Some theoretical justification is developed which establishes strong consistency for the parameter estimates obtained by this method. The theory also suggests a promising variation on the idea using many overlapping bins. Another theoretical topic related to the problems associated with global optimization in SIMEST is also treated. We explore the usefulness of these techniques in modeling the secondary tumor generation mechanisms of cancer. In one model, it is assumed that 3-dimensional data is available: (a) the time from the detection and removal of the primary to the discovery of the first secondary tumor, (b) the volume of the primary tumor, and (c) the volume of the first secondary tumor. In a second model, it is assumed that two additional dimensions of information are available (i.e. 5-dimensional data): (d) the time from the detection of the first secondary tumor to the detection of the second secondary tumor and (e) the volume of the second secondary tumor.