R-3 Repository :: Browsing by Author "Kowal, Daniel R."

Browsing by Author "Kowal, Daniel R."

Now showing 1 - 9 of 9

A Bayesian Multivariate Functional Dynamic Linear Model
(Taylor & Francis, 2017) Kowal, Daniel R.; Matteson, David S.; Ruppert, David
We present a Bayesian approach for modeling multivariate, dependent functional data. To account for the three dominant structural features in the data—functional, time dependent, and multivariate components—we extend hierarchical dynamic linear models for multivariate time series to the functional data setting. We also develop Bayesian spline theory in a more general constrained optimization framework. The proposed methods identify a time-invariant functional basis for the functional observations, which is smooth and interpretable, and can be made common across multivariate observations for additional information sharing. The Bayesian framework permits joint estimation of the model parameters, provides exact inference (up to MCMC error) on specific parameters, and allows generalized dependence structures. Sampling from the posterior distribution is accomplished with an efficient Gibbs sampling algorithm. We illustrate the proposed framework with two applications: (1) multi-economy yield curve data from the recent global recession, and (2) local field potential brain signals in rats, for which we develop a multivariate functional time series approach for multivariate time–frequency analysis. Supplementary materials, including R code and the multi-economy yield curve data, are available online.
Bayesian data synthesis and the utility-risk trade-off for mixed epidemiological data
(Project Euclid, 2022) Feldman, Joseph; Kowal, Daniel R.
Much of the microdata used for epidemiological studies contain sensitive measurements on real individuals. As a result, such microdata cannot be published out of privacy concerns, and without public access to these data, any statistical analyses originally published on them are nearly impossible to reproduce. To promote the dissemination of key datasets for analysis without jeopardizing the privacy of individuals, we introduce a cohesive Bayesian framework for the generation of fully synthetic high-dimensional microdatasets of mixed categorical, binary, count, and continuous variables. This process centers around a joint Bayesian model that is simultaneously compatible with all of these data types, enabling the creation of mixed synthetic datasets through posterior predictive sampling. Furthermore, a focal point of epidemiological data analysis is the study of conditional relationships between various exposures and key outcome variables through regression analysis. We design a modified data synthesis strategy to target and preserve these conditional relationships, including both nonlinearities and interactions. The proposed techniques are deployed to create a synthetic version of a confidential dataset containing dozens of health, cognitive, and social measurements on nearly 20,000 North Carolina children.
Dynamic Regression Models for Time-Ordered Functional Data
(Project Euclid, 2021) Kowal, Daniel R.
For time-ordered functional data, an important yet challenging task is to forecast functional observations with uncertainty quantification. Scalar predictors are often observed concurrently with functional data and provide valuable information about the dynamics of the functional time series. We develop a fully Bayesian framework for dynamic functional regression, which employs scalar predictors to model the time-evolution of functional data. Functional within-curve dependence is modeled using unknown basis functions, which are learned from the data. The unknown basis provides substantial dimension reduction, which is essential for scalable computing, and may incorporate prior knowledge such as smoothness or periodicity. The dynamics of the time-ordered functional data are specified using a time-varying parameter regression model in which the effects of the scalar predictors evolve over time. To guard against overfitting, we design shrinkage priors that regularize irrelevant predictors and shrink toward time-invariance. Simulation studies decisively confirm the utility of these modeling and prior choices. Posterior inference is available via a customized Gibbs sampler, which offers unrivaled scalability for Bayesian dynamic functional regression. The methodology is applied to model and forecast yield curves using macroeconomic predictors, and demonstrates exceptional forecasting accuracy and uncertainty quantification over the span of four decades.
Dynamic shrinkage processes
(Wiley, 2019) Kowal, Daniel R.; Matteson, David S.; Ruppert, David
We propose a novel class of dynamic shrinkage processes for Bayesian time series and regression analysis. Building on a global–local framework of prior construction, in which continuous scale mixtures of Gaussian distributions are employed for both desirable shrinkage properties and computational tractability, we model dependence between the local scale parameters. The resulting processes inherit the desirable shrinkage behaviour of popular global–local priors, such as the horseshoe prior, but provide additional localized adaptivity, which is important for modelling time series data or regression functions with local features. We construct a computationally efficient Gibbs sampling algorithm based on a Pólya–gamma scale mixture representation of the process proposed. Using dynamic shrinkage processes, we develop a Bayesian trend filtering model that produces more accurate estimates and tighter posterior credible intervals than do competing methods, and we apply the model for irregular curve fitting of minute‐by‐minute Twitter central processor unit usage data. In addition, we develop an adaptive time varying parameter regression model to assess the efficacy of the Fama–French five‐factor asset pricing model with momentum added as a sixth factor. Our dynamic analysis of manufacturing and healthcare industry data shows that, with the exception of the market risk, no other risk factors are significant except for brief periods.
Evaluating integration of letter fragments through contrast and spatially targeted masking
(ARVO, 2024) Zhang, Sherry; Morrison, Jack; Sun, Thomas; Kowal, Daniel R.; Greene, Ernest
Four experiments were conducted to gain a better understanding of the visual mechanisms related to how integration of partial shape cues provides for recognition of the full shape. In each experiment, letters formed as outline contours were displayed as a sequence of adjacent segments (fragments), each visible during a 17-ms time frame. The first experiment varied the contrast of the fragments. There were substantial individual differences in contrast sensitivity, so stimulus displays in the masking experiments that followed were calibrated to the sensitivity of each participant. Masks were displayed either as patterns that filled the entire screen (full field) or as successive strips that were sliced from the pattern, each strip lying across the location of the letter fragment that had been shown a moment before. Contrast of masks were varied to be lighter or darker than the letter fragments. Full-field masks, whether light or dark, provided relatively little impairment of recognition, as was the case for mask strips that were lighter than the letter fragments. However, dark strip masks proved to be very effective, with the degree of recognition impairment becoming larger as mask contrast was increased. A final experiment found the strip masks to be most effective when they overlapped the location where the letter fragments had been shown a moment before. They became progressively less effective with increased spatial separation from that location. Results are discussed with extensive reference to potential brain mechanisms for integrating shape cues.
Fast, Optimal, and Targeted Predictions Using Parameterized Decision Analysis
(Taylor & Francis, 2022) Kowal, Daniel R.
Prediction is critical for decision-making under uncertainty and lends validity to statistical inference. With targeted prediction, the goal is to optimize predictions for specific decision tasks of interest, which we represent via functionals. Although classical decision analysis extracts predictions from a Bayesian model, these predictions are often difficult to interpret and slow to compute. Instead, we design a class of parameterized actions for Bayesian decision analysis that produce optimal, scalable, and simple targeted predictions. For a wide variety of action parameterizations and loss functions—including linear actions with sparsity constraints for targeted variable selection—we derive a convenient representation of the optimal targeted prediction that yields efficient and interpretable solutions. Customized out-of-sample predictive metrics are developed to evaluate and compare among targeted predictors. Through careful use of the posterior predictive distribution, we introduce a procedure that identifies a set of near-optimal, or acceptable targeted predictors, which provide unique insights into the features and level of complexity needed for accurate targeted prediction. Simulations demonstrate excellent prediction, estimation, and variable selection capabilities. Targeted predictions are constructed for physical activity (PA) data from the National Health and Nutrition Examination Survey to better predict and understand the characteristics of intraday PA. Supplementary materials for this article are available online.
Semiparametric count data regression for self-reported mental health
(Wiley, 2023) Kowal, Daniel R.; Wu, Bohan
‘‘For how many days during the past 30 days was your mental health not good?” The responses to this question measure self-reported mental health and can be linked to important covariates in the National Health and Nutrition Examination Survey (NHANES). However, these count variables present major distributional challenges: The data are overdispersed, zero-inflated, bounded by 30, and heaped in 5- and 7-day increments. To address these challenges—which are especially common for health questionnaire data—we design a semiparametric estimation and inference framework for count data regression. The data-generating process is defined by simultaneously transforming and rounding (star) a latent Gaussian regression model. The transformation is estimated nonparametrically and the rounding operator ensures the correct support for the discrete and bounded data. Maximum likelihood estimators are computed using an expectation-maximization (EM) algorithm that is compatible with any continuous data model estimable by least squares. star regression includes asymptotic hypothesis testing and confidence intervals, variable selection via information criteria, and customized diagnostics. Simulation studies validate the utility of this framework. Using star regression, we identify key factors associated with self-reported mental health and demonstrate substantial improvements in goodness-of-fit compared to existing count data regression models.
Spatial Variability in Relationships between Early Childhood Lead Exposure and Standardized Test Scores in Fourth Grade North Carolina Public School Students (2013–2016)
(National Institute of Environmental Health Sciences, National Institutes of Health, 2024) Bravo, Mercedes A.; Kowal, Daniel R.; Zephyr, Dominique; Feldman, Joseph; Ensor, Katherine; Miranda, Marie Lynn
Background:Exposure to lead during childhood is detrimental to children’s health. The extent to which the association between lead exposure and elementary school academic outcomes varies across geography is not known.Objective:Estimate associations between blood lead levels (BLLs) and fourth grade standardized test scores in reading and mathematics in North Carolina using models that allow associations between BLL and test scores to vary spatially across communities.Methods:We link geocoded, individual-level, standardized test score data for North Carolina public school students in fourth grade (2013–2016) with detailed birth records and blood lead testing data retrieved from the North Carolina childhood blood lead state registry on samples typically collected at 1–6 y of age. BLLs were categorized as: 1μ⁢g/dL (reference), 2μ⁢g/dL, 3–4μ⁢g/dL and ≥5μ⁢g/dL. We then fit spatially varying coefficient models that incorporate information sharing (smoothness), across neighboring communities via a Gaussian Markov random field to provide a global estimate of the association between BLL and test scores, as well as census tract–specific estimates (i.e., spatial coefficients). Models adjusted for maternal- and child-level covariates and were fit separately for reading and math.Results:The average BLL across the 91,706 individuals in the analysis dataset was 2.84μ⁢g/dL. Individuals were distributed across 2,002 (out of 2,195) census tracts in North Carolina. In models adjusting for child sex, birth weight percentile for gestational age, and Medicaid participation as well as maternal race/ethnicity, educational attainment, marital status, and tobacco use, BLLs of 2μ⁢g/dL, 3–4μ⁢g/dL and ≥5μ⁢g/dL were associated with overall lower reading test scores of −0.28 [95% confidence interval (CI): −0.43, −0.12], −0.53 (−0.69, −0.38), and −0.79 (−0.99, −0.604), respectively. For BLLs of 1μ⁢g/dL, 2μ⁢g/dL, 3–4μ⁢g/dL and ≥5μ⁢g/dL, spatial coefficients—that is, tract-specific adjustments in reading test score relative to the “global” coefficient—ranged from −9.70 to 2.52, −3.19 to 3.90, −11.14 to 7.85, and −4.73 to 4.33, respectively. Results for mathematics were similar to those for reading.Conclusion:The association between lead exposure and reading and mathematics test scores exhibits considerable heterogeneity across North Carolina communities. These results emphasize the need for prevention and mitigation efforts with respect to lead exposures everywhere, with special attention to locations where the cognitive impact is elevated. https://doi.org/10.1289/EHP13898
Stochastic clustering and pattern matching for real-time geosteering
(Society of Exploration Geophysicists, 2019) Wu, Mingqi; Miao, Yinsen; Panchal, Neilkunal; Kowal, Daniel R.; Vannucci, Marina; Vila, Jeremy; Liang, Faming
We have developed a Bayesian statistical framework for quantitative geosteering in real time. Two types of contemporary geosteering approaches, model based and stratification based, are introduced. The latter is formulated as a Bayesian optimization procedure: The log from a pilot reference well is used as a stratigraphic signature of the geologic structure in a given region; the observed log sequence acquired along the wellbore is projected into the stratigraphic domain given a proposed earth model and directional survey; the pattern similarity between the converted log and the signature is measured by a correlation coefficient; then stochastic searching is performed on the space of all possible earth models to maximize the similarity under constraints of the prior understanding of the drilling process and target formation; finally, an inference is made based on the samples simulated from the posterior distribution using stochastic approximation Monte Carlo in which we extract the most likely earth model and the associated credible intervals as a quantified confidence indicator. We extensively test our method using synthetic and real geosteering data sets. Our method consistently achieves good performance on synthetic data sets with high correlations between the interpreted and the reference logs and provides similar interpretations as the geosteering geologists on four real wells. We also conduct a reliability performance test of the method on a benchmark set of 200 horizontal wells randomly sampled from the Permian Basin. Our Bayesian framework informs geologists with key drilling decisions in real time and helps them navigate the drilling bit into the target formation with confidence.

Browsing by Author "Kowal, Daniel R."

Results Per Page

Sort Options