Theses and Dissertations

Permanent URI for this collection

https://hdl.handle.net/1911/112462

Browse

Now showing 1 - 20 of 55

A time series approach to quality control
(1991) Dittrich, Gayle Lynn; Ensor, Katherine B.
One way that a process may be said to be "out-of-control" is when a cyclical pattern exists in the observations over time. It is necessary that an accurate control chart be developed to signal when a cycle is present in the process. Two control charts have recently been developed to deal with this problem. One, based on the periodogram, provides a test based on a finite number of frequencies. The other method uses a test which estimates a statistic which covers all frequency values. However, both methods fail to estimate the frequency value of the cycle and are computationally difficult. A new control chart is proposed which not only covers a continuous range of frequency values, but also estimates the frequency of the cycle. It in addition is easier to understand and compute than the two other methods.
Autocorrelated data in quality control charts
(1994) Hood, Terri Frantom; Ensor, Katherine B.
Control charts are regularly developed with the assumption that the process observations have an independent relationship. However, a common occurrence in certain industries is the collection of autocorrelated data. Two approaches are investigated that deal with this issue. The time series approach is based on modeling the data with an appropriate time series model to remove the autocorrelative structure. The EWMA approach is based on modeling the observations as a weighted average of previous data. The residuals from the two approaches are plotted on control charts and the average run lengths are compared. Both methods are applied to simulations that generate in-control data and data that have strategically located nonstandard conditions. The nonstandard conditions simulated are process change, linear drift, mean shift, and variance shift. It is proposed that the time series approach tends to perform better in these situations.
A stochastic approach to prepayment modeling
(1996) Overley, Mark S.; Thompson, James R.
A new type of prepayment model for use in the valuation of mortgage-backed securities is presented. The model is based on a simple axiomatic characterization of the prepayment decision by the individual in terms of a continuous time, discrete state stochastic process. One advantage of the stochastic approach compared to a traditional regression model is that information on the variability of prepayments is retained. This information is shown to have a significant effect on the value of mortgage-backed derivative securities. Furthermore, the model explains important path dependent properties of prepayments such as seasoning and burnout in a natural way, which improves fit accuracy for mean prepayment rates. This is demonstrated by comparing the stochastic mean to a nonlinear regression model based on time and mortgage rate information for generic Ginnie Mae collateral.
Practical methods for data mining with massive data sets
(1998) Salch, John David; Scott, David W.
The increasing size of data sets has necessitated advancement in exploratory techniques. Methods that are practical for moderate to small data sets become infeasible when applied to massive data sets. Advanced techniques such as binned kernel density estimation, tours, and mode-based projection pursuit will be explored. Mean-centered binning will be introduced as an improved method for binned density estimation. The density grand tour will be demonstrated as a means of exploring massive high-dimensional data sets. Projection pursuit by clustering components will be described as a means to find interesting lower-dimensional subspaces of data sets.
An approach to modeling a multivariate spatial-temporal process
(2000) Calizzi, Mary Anne; Ensor, Katherine B.
Although modeling of spatial-temporal stochastic processes is a growing area of research, one underdeveloped area in this field is the multivariate space-time setting. The motivation for this research originates from air quality studies. By treating each air pollutant as a separate variable, the multivariate approach will enable modeling of not only the behavior of the individual pollutants but also the interaction between pollutants over space and time. Studying both the spatial and the temporal aspects of the process gives a more accurate picture of the behavior of the process. A bivariate state-space model is developed and includes a covariance function which can account for the different cross-covariances across space and time. The Kalman filter is used for parameter estimation and prediction. The model is evaluated through the prediction efforts in an air-quality application.
Futures prices: Data mining and modeling approaches
(2000) Lawera, Martin Lukas; Thompson, James R.
We present a series of models capturing the non-stationarities and dependencies in the variance of yields on natural gas futures. Both univariate and multivariate models are explored, based on the ARIMA and Hidden-Markov methodologies. The models capture the effects uncovered through various data mining techniques including seasonality, age and transaction-time effects. Such effect have been previously described in the literature, but never comprehensively captured for the purpose of modeling. In addition, we have investigated the impact of temporal aggregation, by modeling both the daily and the monthly data. The issue of aggregation has not been explored in the current literature that focused on the daily data with uniformly underwhelming results. We have shown that modifications to current models to allow aggregation leads to improvements in performance. This is demonstrated by comparing the proposed models to the models currently used in the financial markets.
A comprehensive approach to spatial and spatiotemporal dependence modeling
(2000) Baggett, Larry Scott; Ensor, Katherine B.
One of the most difficult tasks of modeling spatial and spatiotemporal random fields is that of deriving an accurate representation of the dependence structure. In practice, the researcher is faced with selecting the best empirical representation of the data, the proper family of parametric models, and the most efficient method of parameter estimation once the model is selected. Each of these decisions has direct consequence on the prediction accuracy of the modeled random field. In order to facilitate the process of spatial dependence modeling, a general class of covariogram estimators is introduced. They are derived by direct application of Bochner's theorem on the Fourier-Bessel series representation of the covariogram. Extensions are derived for one, two and three dimensions and spatiotemporal extensions for one, two and three spatial dimensions as well. A spatial application is demonstrated for prediction of the distribution of sediment contaminants in Galveston Bay estuary, Texas. Also included is a spatiotemporal application to generate predictions for sea surface temperatures adjusted for periodic climatic effects from a long-term study region off southern California.
Robust modeling
(2001) Wojciechowski, William Conrad; Scott, David W.
In this data-rich age, datasets often contain many observations and variables. Verifying the quality of a large dataset is a formidable task that is not to be completed by manual inspection. Therefore, methods that automatically perform well even when the dataset contains anomalous data points are needed. Robust procedures are designed to have this type of stability. A new general purpose robust estimator is introduced. This Bayesian procedure applies Gibbs sampling and data augmentation to achieve robustness by weighting the observations in the likelihood of Bayes' theorem. Because this new estimator relies upon simulation, it has several advantages over existing robust methods. The derivation of the new method will be presented along with examples that compare the new method to existing procedures.
Parameter estimation for discretely observed continuous-time Markov chains
(2001) Cramer, Roxy D.; Ensor, Katherine B.
This thesis develops a method for estimating the parameters of continuous-time Markov chains discretely observed by Poisson sampling. The inference problem in this context is usually simplified by assuming the process to be time-homogeneous and that the process can be observed continuously for some observation period. But many real problems are not homogeneous; moreover, in practice it is often difficult to observe random processes continuously. In this work, the Dynkin Identity motivates a martingale estimating equation which is no more complicated a function of the parameters than the infinitesimal generator of the chain. The time-dependent generators of inhomogeneous chains therefore present no new obstacles. The Dynkin Martingale estimating equation derived here applies to processes discretely observed according to an independent Poisson process. Random observation of this kind alleviates the so-called aliasing problem, which can arise when continuous-time processes are observed discretely. Theoretical arguments exploit the martingale structure to obtain conditions ensuring strong consistency and asymptotic normality of the estimators. Simulation studies of a single-server Markov queue with sinusoidal arrivals test the performance of the estimators under different sampling schemes and against the benchmark maximum likelihood estimators based on continuous observation.
Essays in semiparametric and nonparametric estimation with application to growth accounting
(2001) Jeon, Byung Mok; Brown, Bryan W.
This dissertation develops efficient semiparametric estimation of parameters and expectations in dynamic nonlinear systems and analyzes the role of environmental factors in productivity growth accounting. The first essay considers the estimation of a general class of dynamic nonlinear systems. The semiparametric efficiency bound and efficient score are established for the problems. Using an M-estimator based on the efficient score, the feasible form of the semiparametric efficient estimators is worked out for several explicit assumptions regarding the degree of dependence between the predetermined variables and the disturbances of the model. Using this result, the second essay develops semiparametric estimation of the expectation of known functions of observable variables and unknown parameters in the class of dynamic nonlinear models. The semiparametric efficiency bound for this problem is established and an estimator that achieves the bound is worked out for two explicit assumptions. For the assumption of independence, the residual-based predictors proposed by Brown and Mariano (1989) are shown to be semiparametric efficient. Under unconditional mean zero assumption, I proposed an improved heteroskedastic autocorrelation consistent estimator. The third essay explores the directional distance function method to analyze productivity growth. The method explicitly evaluates the role of undesirable outputs of the economy, such as carbon dioxide and other green-house gases, have on the frontier production process which we specify as a piecewise linear and convex boundary function. We decompose productivity growth into efficiency change (catching up) and technology change (innovation). We test the statistical significance of the estimates using recently developed bootstrap method. We also explore implications for growth of total factor productivity in the OECD and Asia economies.
Venture capital, entrepreneurship, and long-run performance prediction: An application of data mining
(2003) Miller, John Michael; Thompson, James R.; Williams, Edward E.
The critical nature of the venture capital-entrepreneur relationship is emphasized by the 46.4% exponential growth rate of venture capital investments throughout the 1990s. It is that time in the venture capital cycle between the time the first stage funding is made and the venture capitalist exits that the greatest opportunity exists for the venture capitalist to influence the outcome of his limited partners' investment. Theories have been offered to explain the effectiveness of the venture capitalist through agency, procedural justice, information, environment, and power theories. The first stage of this study investigates the predictive ability of the entrepreneur's attitudes toward his venture capital partner for long-term performance using entrepreneur attitudes in the light of these theories. The focus of the second and third stages of this analysis is on the ability of internal auditing and environmental factors characterizing the firm at the time of its IPO as predictors of long-term investor wealth appreciation. Data mining involves conducting all three steps in the development of a mathematical model of any phenomenon: structure generation, parameter estimation, and model confirmation, on the same set of data. In this development of a prediction scheme of firm performance we focus on model generation and preliminary model parameter estimation. The data for these analyses were obtained from a 1990 survey of top management of 145 venture capital funded enterprises, plus SEC filings on 563 Initial Public Offerings (IPOs) issued in 1997, stock market prices, and public accounting data. Both sets of data are treated according to an operational measurement theory rather than the traditional representational mode. As a result: (1) entrepreneur appreciation for strategic information, and new idea support, from his venture capitalist, are found to be predictive of subsequent business performance as successful IPO or merger/acquisition harvests; (2) routine application of non-parametric methods to wealth appreciation data for the time 1997--2001 casts doubt on the characterization of that time as a "boom," while confirming the anomaly of IPO underperformance; and (3) accounting data available at the time of IPO may be able to predict subsequent stock market performance three years out.
Gaussian mixture regression and classification
(2004) Sung, Hsi Guang; Scott, David W.
The sparsity of high dimensional data space renders standard nonparametric methods ineffective for multivariate data. A new procedure, Gaussian Mixture Regression (GMR), is developed for multivariate nonlinear regression modeling. GMR has the tight structure of a parametric model, yet still retains the flexibility of a nonparametric method. The key idea of GMR is to construct a sequence of Gaussian mixture models for the joint density of the data, and then derive conditional density and regression functions from each model. Assuming the data are a random sample from the joint pdf fX,Y, we fit a Gaussian kernel density model fˆX,Y and then implement a multivariate extension of the Iterative Pairwise Replacement Algorithm (IPRA) to simplify the initial kernel density. IPRA generates a sequence of Gaussian mixture density models indexed by the number of mixture components K. The corresponding regression function of each density model forms a sequence of regression models which covers a spectrum of regression models of varying flexibility, ranging from approximately the classical linear model (K = 1) to the nonparametric kernel regression estimator (K = n). We use mean squared error and prediction error for selecting K. For binary responses, we extend GMR to fit nonparametric logistic regression models. Applying IPRA for each class density, we obtain two families of mixture density models. The logistic function can then be estimated by the ratio between pairs of members from each family. The result is a family of logistic models indexed by the number of mixtures in each density model. We call this procedure Gaussian Mixture Classification (GMC). For a given GMR or GMC model, forward and backward projection algorithms are implemented to locate the optimal subspaces that minimize information loss. They serve as the model-based dimension reduction techniques for GMR and GMC. In practice, GMR and GMC offer data analysts a systematic way to determine the appropriate level of model flexibility by choosing the number of components for modeling the underlying pdf. GMC can serve as an alternative or a complement to Mixture Discriminant Analysis (MDA). The uses of GMR and GMC are demonstrated in simulated and real data.
Market outperformance by nonparametric, simugram-based portfolio selection
(2004) Dobelman, John August; Thompson, James R.; Williams, Edward E.
A new portfolio selection system is presented which weights components in a target major market index such that the resulting portfolio consistently outperforms the underlying market index by most any multi-period return measure. This is accomplished by use of the simugram, which gives a simulation-based distribution of outcomes of a stochastic experiment. This distribution is time- or space indexed and presents the whole distribution instead of a few moments. When applied to financial engineering problems, it provides a time-indexed risk profile of positions, which is applied as the objective function in the non-linear optimization of portfolio weights. This technique is in contrast to the mean-variance selection model, which seeks to minimize portfolio variance subject to a target return. The simugram-based selection system maximizes portfolio return subject to a non-linear risk tolerance parameter based on the simugram risk profile of all possible portfolio outcomes. For the SP-100 stock index portfolio in the 33-year study period, using multi-period return measures of annualized return and terminal value, the simugram annualized return is on the order of 3 times that of the market benchmark. And for every $l million the market returned in terminal value over this time, the simugram portfolio returned $45 million.
Estimating marginal survival in the presence of dependent and independent censoring: With applications to dividend initiation policy
(2005) Fix, Gretchen Abigail; Ensor, Katherine B.; Huang, Xuelin
In many survival analysis settings, the assumption of non-informative (i.e. independent) censoring is not valid. Zheng and Klein (1995, 1996) develop a copula-based method for estimating the marginal survival functions of bivariate dependent competing risks data. We expand upon this earlier work and adapt their method to data in which there are three competing risks representing both dependent and independent censoring. Specifically, our extension allows for the estimation of the survival functions of dependent competing risks X and Y in the presence of a third independent competing risk Z. An application to dividend initiation data is presented.
An examination of some open problems in time series analysis
(2005) Davis, Ginger Michelle; Ensor, Katherine B.
We investigate two open problems in the area of time series analysis. The first is developing a methodology for multivariate time series analysis when our time series has components that are both continuous and categorical. Our specific contribution is a logistic smooth transition regression (LSTR) model whose transition variable is related to a categorical variable. This methodology is necessary for series that exhibit nonlinear behavior dependent on a categorical variable. The estimation procedure is investigated both with simulation and an economic example. The second contribution to time series analysis is examining the evolving structure in multivariate time series. The application area we concentrate on is financial time series. Many models exist for the joint analysis of several financial instruments such as securities due to the fact that they are not independent. These models often assume some type of constant behavior between the instruments over the time period of analysis. Instead of imposing this assumption, we are interested in understanding the dynamic covariance structure in our multivariate financial time series, which will provide us with an understanding of changing market conditions. In order to achieve this understanding, we first develop a multivariate model for the conditional covariance and then examine that estimate for changing structure using multivariate techniques. Specifically, we simultaneously model individual stock data that belong to one of three market sectors and examine the behavior of the market as a whole as well as the behavior of the sectors. Our aims are detecting and forecasting unusual changes in the system, such as market collapses and outliers, and understanding the issue of portfolio diversification in multivariate financial series from different industry sectors. The motivation for this research concerns portfolio diversification. The false assumption that investment in different industry sectors is uncorrelated is not made. Instead, we assume that the comovement of stocks within and between sectors changes with market conditions. Some of these market conditions include market crashes or collapses and common external influences.
Denoising by wavelet thresholding using multivariate minimum distance partial density estimation
(2006) Scott, Alena I.; Scott, David W.
In this thesis, we consider wavelet-based denoising of signals and images contaminated with white Gaussian noise. Existing wavelet-based denoising methods are limited because they make at least one of the following three unrealistic assumptions: (1) the wavelet coefficients are independent, (2) the signal component of the wavelet coefficient distribution follows a specified parametric model, and (3) the wavelet representations of all signals of interest have the same level of sparsity. We develop an adaptive wavelet thresholding algorithm that addresses each of these issues. We model the wavelet coefficients with a two-component mixture in which the noise component is Gaussian but the signal component need not be specified. We use a new technique in density estimation which minimizes an distance criterion (L2E) to estimate the parameters of the partial density that represents the noise component. The L2E estimate for the weight of the noise component, w&d4;L2E , determines the fraction of wavelet coefficients that the algorithm considers noise; we show that w&d4;L2E corresponds to the level of complexity of the signal. We also incorporate information on inter-scale dependencies by modeling across-scale (parent/child) groups of adjacent coefficients with multivariate densities estimated by L 2E. To assess the performance of our method, we compare it to several standard wavelet-based denoising algorithms on a number of benchmark signals and images. We find that our method incorporating inter-scale dependencies gives results that are an improvement over most of the standard methods and are comparable to the rest. The L2E thresholding algorithm performed very well for 1-D signals, especially those with a considerable amount of high frequency content. Our method worked reasonably well for images, with some apparent advantage in denoising smaller images. In addition to providing a standalone denoising method, L2E can be used to estimate the variance of the noise in the signal for use in other thresholding methods. We also find that the L2E estimate for the noise variance is always comparable and sometimes better than the conventional median absolute deviation estimator.
Statistical models for intraday trading dynamics
(2007) Bhatti, Chad Reyhan; Cox, Dennis D.
Advances in computational power and data storage have spawned a new research area in financial economics and statistics called high-frequency finance. The defining feature of high-frequency finance is the analysis of financial processes over short intraday time horizons. This time horizon may be the trade-by-trade behavior of the market, or it may be locally aggregated behavior over intraday intervals. The analysis of intraday financial processes is motivated by the micro-foundations of aggregate market behavior. It is hoped that micro-level market properties can help explain macro-level market properties. Two topics of particular interest are the statistical modeling of these intraday processes and the temporal aggregation of these intraday statistical models. This dissertation examines the statistical modeling of intraday trading dynamics. The particular aspect of trading dynamics of interest is the relationship between the trade and quote processes. The affect of trading activity on quoting behavior is one of the central problems in the economic theory of market microstructure. In order to investigate this relationship at the transaction level, the dynamics of the trade and quote processes for eight securities traded on the New York Stock Exchange (NYSE) are modeled in a market microstructure framework. We begin by defining the EL Model and the EL Model framework developed in Engle and Lunde (2003). We propose an alternative to the EL Model for the modeling of trade and quote dynamics using the Cox regression model. The Cox regression model has many data analytic advantages. With the Cox regression model we are able to perform a thorough statistical analysis of transaction level trade and quote behavior. We conclude by investigating a local Poisson approximation of intraday trade and quote behavior in five minute intervals using the Poisson generalized linear model with dispersion.
The analysis of limit orders using the Cox proportional hazards model with independent competing risks
(2008) Kenney, Colleen; Ensor, Katherine B.
I apply the Cox proportional hazards model with independent competing risks to study the hazard rates of executed, cancelled, and partially executed limit orders submitted for Microsoft to the Island ECN for one day. The instantaneous probability of execution increases with decreases in the buy order price but increases to the sell order price, increases in volume on the sell side of the market and market activity. The probability of cancellation increases with increases in the liquidity demand and market activity for buy orders, volume on the same side of the market and absolute market activity for sell orders. Finally, the partially executed hazard rate for buy orders increases with increases in price, volume on the opposite side of the market, size, and absolute market activity; for sell orders, the hazard rate increases with increases in the volume on the same side of the market, liquidity demanded, and market activity.
Term structures of conditional probabilities of corporate default in an incomplete information setting
(2008) Jabri, Hanane; Riedi, Rudolf H.
With the emergence and expansion of credit derivatives, which are financial instruments that are based on corporate bonds and provide their holders a protection against default, the importance of estimating probabilities of default has reached an unprecedented level. We have developed a Bayesian model to estimate term structures of conditional probabilities of corporate default, in an incomplete information setting. In such settings, investors do not have a complete picture of the economy nor of the true financial status of a firm. Therefore, we introduce a stochastic frailty to capture this unobservable source of uncertainty and to model default clustering. Frailty is found to have an impact on conditional default probabilities and on the default correlation between firms. The resulting values are well above those predicted by observable stochastic covariates: US interest rates, US Personal Income and a firm's distance-to-default.
Estimating realized covariance using high frequency data
(2008) Kyj, Lada Maria; Ensor, Katherine B.
Assessing the economic value of increasingly precise covariance estimates is of great interest in finance. We present a realized tick-time covariance estimator that incorporates cross-market tick-matching and intelligent sub-sampling. These features of the estimator offer the potential for improved performance in the presence of asynchroneity and market microstructure noise. Specifically, tick-matching preserves information when arrival structures are asynchronous, and intelligent sampling and averaging across sub-samples reduces microstructure-induced noise and estimation error. We compare the performance of this estimator with prevailing methodologies in a simulation study and by assessing out-of-sample volatility-timing portfolio optimization strategies. We demonstrate the benefits of tick time over calendar time, optimal sampling over ad-hoc sampling, and sub-sampling over sampling. Results show that our estimator has smaller mean squared error, smaller bias, and greater economic utility than prevailing methodologies. Our proposed optimized tick-time estimator improves upon both prevailing calendar-time methods and ad-hoc sampling schemes in tick time. Empirical results indicate substantial gains; approximately 70 basis points improvement against the 5 minute calendar time sampling scheme; approximately 80 basis points against optimally sampled calendar time; and 30 basis points against tick time sampled every 5th tick. Both simulation and empirical results indicate that tick time is the better sampling scheme for portfolios with illiquid securities. Asset allocation is inherently a high dimensional problem and estimated realized covariance matrices fail to be well-conditioned in high dimensions. As a result, the portfolios constructed are far-from optimal. Factor modeling offers a solution to both the growing computational complexity and conditioning of the covariance matrices. We find that risk averse investors would be willing to pay up to 30 basis points annually to switch from the best performing exponentially smoothed portfolio to the best performing single-index portfolio. As the number of assets increases, portfolio allocation using the single-index model is better able to replicate the benchmark index. For high-dimensional allocation problems, factor models are a more natural setting for employing realized covariance estimators.

Browse

Browsing Theses and Dissertations by Issue Date

Results Per Page

Sort Options