Repository logo
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of R-3
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Ensor, Katherine B"

Now showing 1 - 8 of 8
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Advances in the Analysis of Spatially Aggregated Data
    (2020-04-23) Schedler, Julia C; Ensor, Katherine B
    An understanding of the spatial relationships in sociological and epidemiological applications is an important tool in the analysis of urban data. While point level data (e.g. observations at a given latitude/longitude) provide the most detail about spatial phenomenon, spatial data aggregated to the level of relevant municipal regions is easily accessible and can provide insights at a level useful for policy decisions for governments and communities. This work identifies two areas of focus in the analysis of spatially aggregated data. First, a new specification for dependence in spatial regression models for aggregated data using the Hausdorff distance and extended Hausdorff distance is introduced. The new dependence structure is shown to account for the shape and orientation of the irregular and disconnected regions often encountered in practice and evaluated in the context of model performance as well as a real data example. An R package compatible with existing spatial packages which implements the construction of spatial weight matrices generated using the (extended) Hausdorff distance is provided along with a vignette illustrating its use on real data. Second, the idea of a spatial case-crossover model is explored in the context of connection to existing spatial methods. A method for including spatial dependence in a spatio-temporal case-crossover model is also explored.
  • Loading...
    Thumbnail Image
    Item
    Computational and Statistical Methodology for Highly Structured Data
    (2020-09-15) Weylandt, Michael; Ensor, Katherine B
    Modern data-intensive research is typically characterized by large scale data and the impressive computational and modeling tools necessary to analyze it. Equally important, though less remarked upon, is the important structure present in large data sets. Statistical approaches that incorporate knowledge of this structure, whether spatio-temporal dependence or sparsity in a suitable basis, are essential to accurately capture the richness of modern large scale data sets. This thesis presents four novel methodologies for dealing with various types of highly structured data in a statistically rich and computationally efficient manner. The first project considers sparse regression and sparse covariance selection for complex valued data. While complex valued data is ubiquitous in spectral analysis and neuroimaging, typical machine learning techniques discard the rich structure of complex numbers, losing valuable phase information in the process. A major contribution of this project is the development of convex analysis for a class of non-smooth "Wirtinger" functions, which allows high-dimensional statistical theory to be applied in the complex domain. The second project considers clustering of large scale multi-way array ("tensor") data. Efficient clustering algorithms for convex bi-clustering and co-clustering are derived and shown to achieve an order-of-magnitude speed improvement over previous approaches. The third project considers principal component analysis for data with smooth and/or sparse structure. An efficient manifold optimization technique is proposed which can flexibly adapt to a wide variety of regularization schemes, while efficiently estimating multiple principal components. Despite the non-convexity of the manifold constraints used, it is possible to establish convergence to a stationary point. Additionally, a new family of "deflation" schemes are proposed to allow iterative estimation of nested principal components while maintaining weaker forms of orthogonality. The fourth and final project develops a multivariate volatility model for US natural gas markets. This model flexibly incorporates differing market dynamics across time scales and different spatial locations. A rigorous evaluation shows significantly improved forecasting performance both in- and out-of-sample. All four methodologies are able to flexibly incorporate prior knowledge in a statistically rigorous fashion while maintaining a high degree of computational performance.
  • Loading...
    Thumbnail Image
    Item
    Dynamic Characterization of Multivariate Time Series
    (2017-12-01) MELNIKOV, OLEG; Ensor, Katherine B
    The standard non-negative matrix factorization focuses on batch learning assuming that the fixed global latent parameters completely describe the observations. Many online extensions assume rigid constraints and smooth continuity in observations. However, the more complex time series processes can have multivariate distributions switch between a finite number of states or regimes. In this paper we proposes a regime-switching model for non-negative matrix factorization and present a method of forecasting in this lower-dimensional regime-dependent space. The time dependent observations are partitioned into regimes to enhance factors' interpretability inherent in non-negative matrix factorization. We use weighted non-negative matrix factorization to handle missing values and to avoid needless contamination of observed structure. Finally, we propose a method of forecasting from the regime components via threshold autoregressive model and projecting the forecasts back to the original target space. The computation speed is improved by parallelizing weighted non-negative matrix factorization over multiple CPUs. We apply our model to hourly air quality measurements by building regimes from deterministically identified day and night observations. Air pollutants are then partitioned, factorized and forecasted, mostly outperforming the results standard non-negative matrix factorization with respect of the Frobenius norm of the error. We also discuss the shortcomings of the new model.
  • Loading...
    Thumbnail Image
    Item
    Dynamic Multivariate Wavelet Signal Extraction and Forecasting with Applications to Finance
    (2020-04-16) Raath, Kim C; Ensor, Katherine B
    Over the past few years, we have seen an increased need for analyzing the dynamically changing behaviors of economic and financial time series. These needs have led to significant demand for methods that denoise non-stationary time series across time and for specific investment horizons (scales) and localized windows (blocks) of time. This thesis consists of a three-part series of papers. The first paper develops a wavelet framework for the finance and economics community to quantify dynamic, interconnected relationships between non-stationary time series. The second paper introduces a novel continuous wavelet transform, dynamically-optimized, multivariate thresholding method to extract the optimal signal from multivariate time series. Finally, the third paper presents an augmented stochastic volatility wavelet-based forecasting method building on the partial mixture distribution modeling framework introduced in the second paper. Promising results in economics and finance have come from implementing wavelet analysis, however more advanced wavelet techniques are needed as well as more robust statistical analysis tools. In support of this expansion effort, we developed a comprehensive and user-friendly R package, CoFESWave, containing our newly developed thresholding and forecasting methods.
  • Loading...
    Thumbnail Image
    Item
    Filtering and Estimation for a Class of Stochastic Volatility Models with Intractable Likelihoods
    (2015-08-28) Vankov, Emilian; Ensor, Katherine B
    A new approach to state filtering and parameter estimation for a class of stochastic volatility models for which the likelihood function is unknown is considered. The alpha-stable stochastic volatility model provides a flexible framework for modeling asymmetry and heavy tails, which is useful when modeling financial returns. However, a problem posed by the alpha-stable distribution is the lack of a closed form for the probability density function, which prevents its direct application to standard filtering and estimation techniques such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC). To circumvent this difficulty, researchers have recently developed various approximate Bayesian computation (ABC) methods, which require only that one is able to simulate data from the model. To obtain filtered volatility estimates, we develop a novel ABC based auxiliary particle filter (APF-ABC). The algorithm we develop can be easily applied to many state space models for which the likelihood function is intractable or computationally expensive. APF-ABC improves on the accuracy through better proposal distributions in cases where the optimal importance density of the filter is unavailable. Further, a new particle based MCMC (PMCMC) method is proposed for parameter estimation in this class of volatility models. PMCMC methods combine SMC with MCMC to produce samples from the joint stationary distribution of the latent states and parameters. If full conditional distributions for all parameters are available then the particle Gibbs sampler is typically adopted; otherwise, the particle marginal Metropolis-Hastings can be used for posterior estimation. Although, several ABC based extensions of PMCMC have been proposed for the symmetric alpha-stable stochastic volatility model, all have used the particle marginal Metropolis-Hastings algorithm due to the inability to obtain full conditional distributions for all parameters in the model. However, the availability of full conditional distributions for a subset of the parameters raises the natural question of whether it is possible to estimate some of the parameters using their full conditionals, while others using a Metropolis-Hastings step. The algorithm that is proposed addresses this exact question. It is shown through a simulation study, that such a strategy can lead to increases in efficiency in the estimation process. Moreover, in contrast to previous works, this thesis studies the asymmetric alpha-stable stochastic volatility model.
  • Loading...
    Thumbnail Image
    Item
    Financial time series forecasting via RNNs and Wavelet Analysis
    (2022-04-22) Jackson, Mike Demone; Ensor, Katherine B
    Recent successes in both Artificial Neural Networks (ANN) and wavelets have placed these two methods in the spotlight of quantitative traders seeking the best tool to forecast financial time series. The Wavelet Neural Network (W-NN), a prediction model which combines wavelet-based denoising and ANN, has successfully combined the two strategies in order to make accurate predictions of financial time series. We explore how the most recent formulation of the W-NN model, with the Nonlinear Autoregressive Neural Network with Exogenous variables (NARX), is affected by the choice of wavelet thresholding technique when predicting daily returns of American index futures contracts. We explore how the choice of thresholding technique affects the profitability of two technical trading models based on daily return predictions from a NARX-based W-NN. The purpose of this research is twofold: it compares the effect of different wavelet thresholding techniques on a NARX-based W-NN’s forecasting ability on 1-day returns of American index futures contracts and offers two easy-to-implement trading strategies. In the second part of the thesis, we formulate a hybrid NARX-based seasonal predictive model, Seasonal Nonlinear Autoregressive Neural Network with Exogenous Variables (S-NARX ), for end-of-day volume, where end-of-day volume is directly driven by the end of the day auctions. The S-NARX model will seek to take advantage of the information found in the data up until the auction time and high-frequency intraday trading volume’s diurnal seasonal pattern to predict end-of-day volume. Volume is well known to be a leading indicator of price changes and the two metrics are simultaneously positively correlated. Algorithmic traders rely on accurate volume predictions to deploy algorithmic trading algorithms, especially when utilizing a Volume Weighted Average Price (VWAP) algorithm, that allows the execution of large orders with minimal slippage. Fundamental and quantitative investors are also interested in trading volume because it is a measure of trading intensity and an indicator of market liquidity. The S-NARX augments the NARX with the feature set from a seasonal ARMA(P,Q)[s] and offers quantitative traders a flexible machine learning model for forecasting time series with both longer dependencies and seasonality. Finally, we develop an R package that provides the traditional NARX network along with the novel seasonal version of the CoFES S-NARX that augments the NARX feature set with the features from an ARMA(P,0)[s]. The networks are built using the Keras framework in R and utilize the sequential model from this package.
  • Loading...
    Thumbnail Image
    Item
    Joint Estimation and Selection of Multiple Graphical Models for Microbiome Data
    (2022-08-12) Robinson, Sarah; Ensor, Katherine B; Peterson, Christine B
    The human microbiome, which plays a key role in health and disease, consists of a dynamic community of microorganisms. There is a keen interest in understanding interactions among these microbes, and how these relations change over time. However, current methods for microbiome network inference exist only for a single time point. We propose a novel method to jointly estimate time-varying network associations for microbiome data, which encourages edge similarity across a neighborhood of time points. To account for the compositional constraint and zero-inflation that typify microbiome data sets, we utilize a modified centered-log ratio transformation, then use a truncated Gaussian copula model to estimate the covariance matrices at each time point. We also propose an extension of this method to analyze multi-site or multi-domain microbiome data. We compare the performance of our method to existing alternative approaches on simulated data and apply the proposed method to learn cross-site dynamic networks based on oral and stool microbiome samples collected from leukemic patients during the course of cancer treatment. In the first project, we encountered challenges in selecting reasonably sparse models using traditional model selection criteria. While AIC and BIC, two of the most popular model selection information criteria, attempt to balance model fit and sparsity, selected models still tend to be very dense. Other existing approaches were not well suited to handle the selection of multiple hyperparameters to satisfy multiple objectives. We therefore propose multi-objective optimization to allow the user to filter through the model trade-offs to achieve a more desirable model. In this method, we allow for simultaneous hyperparameter tuning, rather than performing a more traditional grid search. In this project, we focus on its use in the selection of both single and joint graphical models, but we note that this method can be generalized to a wide variety of statistical models where competing objectives, such as sparsity or smoothness and fit, need to be optimized. We demonstrate its use for model selection for both the graphical lasso and the joint graphical lasso.
  • Loading...
    Thumbnail Image
    Item
    Prediction Oriented Marker Selection (PROMISE) for High Dimensional Regression with Application to Personalized Medicine
    (2015-10-27) Kim, Soyeon; Scott, David W.; Lee, J.Jack; Baladandayuthapani, Veerabhadran; Ensor, Katherine B; Nakhleh, Luay K
    In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on a patients individual biomarker profile. Two important goals of biomarker selection are to choose a small number of important biomarkers that are associated with treatment outcomes and to maintain a high-level of prediction accuracy. These goals are challenging because the number of candidate biomarkers can be large compared to the sample size. Established methods for variable selection based on penalized regression methods such as the lasso and the elastic net have yielded promising results. However, selecting the right amount of penalization is critical to maintain the desired properties for both variable selection and prediction accuracy. To select the regularization parameter, cross-validation (CV) is most commonly used. It tends to provide high prediction accuracy as well as a high true positive rate, at the cost of a high false positive rate. Resampling methods such as stability selection (SS) conversely maintains a good control of the false positive rate, but at the cost of yielding too few true positives. We propose prediction oriented marker selection (PROMISE), which combines SS with CV to include the advantages of both methods. We applied PROMISE to (1) the lasso and (2) the elastic net for individual marker selection, (3) the group lasso for pathway selection, and (4) the combination of the group lasso with the lasso for individual marker selection within the selected pathways. Data analysis show that PROMISE produces a more sparse solution than CV, reducing the false positives compared to CV, while giving similar prediction accuracy and true positives. In our simulation and real data analysis, SS does not work well for variable selection and prediction. PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize both type I and type II errors and to maximize prediction accuracy.
  • About R-3
  • Report a Digital Accessibility Issue
  • Request Accessible Formats
  • Fondren Library
  • Contact Us
  • FAQ
  • Privacy Notice
  • R-3 Policies

Physical Address:

6100 Main Street, Houston, Texas 77005

Mailing Address:

MS-44, P.O.BOX 1892, Houston, Texas 77251-1892