Statistics
Permanent URI for this community
Browse
Browsing Statistics by Title
Now showing 1 - 20 of 159
Results Per Page
Sort Options
Item A Bayesian Integrative Model for Genetical Genomics with Spatially Informed Variable Selection(Libertas Academica, 2014) Cassese, Alberto; Guindani, Michele; Vannucci, MarinaWe consider a Bayesian hierarchical model for the integration of gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. The approach defines a measurement error model that relates the gene expression levels to latent copy number states. In turn, the latent states are related to the observed surrogate CGH measurements via a hidden Markov model. The model further incorpo-rates variable selection with a spatial prior based on a probit link that exploits dependencies across adjacent DNA segments. Posterior inference is carried out via Markov chain Monte Carlo stochastic search techniques. We study the performance of the model in simulations and show better results than those achieved with recently proposed alternative priors. We also show an application to data from a genomic study on lung squamous cell carcinoma, where we identify potential candidates of associations between copy number variants and the transcriptional activity of target genes. Gene ontology (GO) analyses of our findings reveal enrichments in genes that code for proteins involved in cancer. Our model also identifies a number of potential candidate biomarkers for further experimental validation.Item A Bayesian model for the identification of differentially expressed genes in Daphnia magna exposed to munition pollutants(Wiley, 2015) Cassese, Alberto; Guindani, Michele; Antczak, Philipp; Falciani, Francesco; Vannucci, MarinaIn this article we propose a Bayesian hierarchical model for the identification of differentially expressed genes in Daphnia magna organisms exposed to chemical compounds, specifically munition pollutants in water. The model we propose constitutes one of the very first attempts at a rigorous modeling of the biological effects of water purification. We have data acquired from a purification system that comprises four consecutive purification stages, which we refer to as "ponds," of progressively more contaminated water. We model the expected expression of a gene in a pond as the sum of the mean of the same gene in the previous pond plus a gene-pond specific difference. We incorporate a variable selection mechanism for the identification of the differential expressions, with a prior distribution on the probability of a change that accounts for the available information on the concentration of chemical compounds present in the water. We carry out posterior inference via MCMC stochastic search techniques. In the application, we reduce the complexity of the data by grouping genes according to their functional characteristics, based on the KEGG pathway database. This also increases the biological interpretability of the results. Our model successfully identifies a number of pathways that show differential expression between consecutive purification stages. We also find that changes in the transcriptional response are more strongly associated to the presence of certain compounds, with the remaining contributing to a lesser extent. We discuss the sensitivity of these results to the model parameters that measure the influence of the prior information on the posterior inference.Item A Bayesian Nonparametric Approach for Functional Data Classification with Application to Hepatic Tissue Characterization(Libertas Academica, 2015) Fronczyk, Kassandra M.; Guindani, Michele; Hobbs, Brian P.; Ng, Chaan S.; Vannucci, MarinaComputed tomography perfusion (CTp) is an emerging functional imaging technology that provides a quantitative assessment of the passage of fluid through blood vessels. Tissue perfusion plays a critical role in oncology due to the proliferation of networks of new blood vessels typical of cancer angiogenesis, which triggers modifications to the vasculature of the surrounding host tissue. In this article, we consider a Bayesian semiparametric model for the analysis of functional data. This method is applied to a study of four interdependent hepatic perfusion CT characteristics that were acquired under the administration of contrast using a sequence of repeated scans over a period of 590 seconds. More specifically, our modeling framework facilitates borrowing of information across patients and tissues. Additionally, the approach enables flexible estimation of temporal correlation structures exhibited by mappings of the correlated perfusion biomarkers and thus accounts for the heteroskedasticity typically observed in those measurements, by incorporating change-points in the covariance estimation. This method is applied to measurements obtained from regions of liver surrounding malignant and benign tissues, for each perfusion biomarker. We demonstrate how to cluster the liver regions on the basis of their CTp profiles, which can be used in a prediction context to classify regions of interest provided by future patients, and thereby assist in discriminating malignant from healthy tissue regions in diagnostic settings.Item A Bayesian Nonparametric Spiked Process Prior for Dynamic Model Selection(Project Euclid, 2019) Cassese, Alberto; Zhu, Weixuan; Guindani, Michele; Vannucci, MarinaIn many applications, investigators monitor processes that vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or “normal” behavior. In this manuscript, we consider the monitoring of pneumonia and influenza (P&I) mortality, to detect influenza outbreaks in the continental United States, and propose a Bayesian nonparametric model selection approach to take into account the spatio-temporal dependence of outbreaks. More specifically, we introduce a zero-inflated conditionally identically distributed species sampling prior which allows borrowing information across time and to assign data to clusters associated to either a null or an alternate process. Spatial dependences are accounted for by means of a Markov random field prior, which allows to inform the selection based on inferences conducted at nearby locations. We show how the proposed modeling framework performs in an application to the P&I mortality data and in a simulation study, and compare with common threshold methods for detecting outbreaks over time, with more recent Markov switching based models, and with spike-and-slab Bayesian nonparametric priors that do not take into account spatio-temporal dependence.Item A convex-nonconvex strategy for grouped variable selection(Project Euclid, 2023) Liu, Xiaoqian; Molstad, Aaron J.; Chi, Eric C.This paper deals with the grouped variable selection problem. A widely used strategy is to augment the negative log-likelihood function with a sparsity-promoting penalty. Existing methods include the group Lasso, group SCAD, and group MCP. The group Lasso solves a convex optimization problem but suffers from underestimation bias. The group SCAD and group MCP avoid this estimation bias but require solving a nonconvex optimization problem that may be plagued by suboptimal local optima. In this work, we propose an alternative method based on the generalized minimax concave (GMC) penalty, which is a folded concave penalty that maintains the convexity of the objective function. We develop a new method for grouped variable selection in linear regression, the group GMC, that generalizes the strategy of the original GMC estimator. We present a primal-dual algorithm for computing the group GMC estimator and also prove properties of the solution path to guide its numerical computation and tuning parameter selection in practice. We establish error bounds for both the group GMC and original GMC estimators. A rich set of simulation studies and a real data application indicate that the proposed group GMC approach outperforms existing methods in several different aspects under a wide array of scenarios.Item A Hierarchical Bayesian Model for the Identification of PET Markers Associated to the Prediction of Surgical Outcome after Anterior Temporal Lobe Resection(Frontiers Media S.A., 2017) Chiang, Sharon; Guindani, Michele; Yeh, Hsiang J.; Dewar, Sandra; Haneef, Zulfi; Stern, John M.; Vannucci, MarinaWe develop an integrative Bayesian predictive modeling framework that identifies individual pathological brain states based on the selection of fluoro-deoxyglucose positron emission tomography (PET) imaging biomarkers and evaluates the association of those states with a clinical outcome. We consider data from a study on temporal lobe epilepsy (TLE) patients who subsequently underwent anterior temporal lobe resection. Our modeling framework looks at the observed profiles of regional glucose metabolism in PET as the phenotypic manifestation of a latent individual pathologic state, which is assumed to vary across the population. The modeling strategy we adopt allows the identification of patient subgroups characterized by latent pathologies differentially associated to the clinical outcome of interest. It also identifies imaging biomarkers characterizing the pathological states of the subjects. In the data application, we identify a subgroup of TLE patients at high risk for post-surgical seizure recurrence after anterior temporal lobe resection, together with a set of discriminatory brain regions that can be used to distinguish the latent subgroups. We show that the proposed method achieves high cross-validated accuracy in predicting post-surgical seizure recurrence.Item A joint modeling approach for longitudinal microbiome data improves ability to detect microbiome associations with disease(Public Library of Science, 2020) Luna, Pamela N.; Mansbach, Jonathan M.; Shaw, Chad A.Changes in the composition of the microbiome over time are associated with myriad human illnesses. Unfortunately, the lack of analytic techniques has hindered researchers’ ability to quantify the association between longitudinal microbial composition and time-to-event outcomes. Prior methodological work developed the joint model for longitudinal and time-to-event data to incorporate time-dependent biomarker covariates into the hazard regression approach to disease outcomes. The original implementation of this joint modeling approach employed a linear mixed effects model to represent the time-dependent covariates. However, when the distribution of the time-dependent covariate is non-Gaussian, as is the case with microbial abundances, researchers require different statistical methodology. We present a joint modeling framework that uses a negative binomial mixed effects model to determine longitudinal taxon abundances. We incorporate these modeled microbial abundances into a hazard function with a parameterization that not only accounts for the proportional nature of microbiome data, but also generates biologically interpretable results. Herein we demonstrate the performance improvements of our approach over existing alternatives via simulation as well as a previously published longitudinal dataset studying the microbiome during pregnancy. The results demonstrate that our joint modeling framework for longitudinal microbiome count data provides a powerful methodology to uncover associations between changes in microbial abundances over time and the onset of disease. This method offers the potential to equip researchers with a deeper understanding of the associations between longitudinal microbial composition changes and disease outcomes. This new approach could potentially lead to new diagnostic biomarkers or inform clinical interventions to help prevent or treat disease.Item A Post Keynesian Analysis of the Black-Scholes Option Pricing Model(Rice University, 1999) Thompson, J.R.; Williams, E.E.Item A spatiotemporal case-crossover model of asthma exacerbation in the City of Houston(Wiley, 2021) Schedler, Julia C.; Ensor, Katherine B.Case-crossover design is a popular construction for analyzing the impact of a transient effect, such as ambient pollution levels, on an acute outcome, such as an asthma exacerbation. Case-crossover design avoids the need to model individual, time-varying risk factors for cases by using cases as their own ‘controls’, chosen to be time periods for which individual risk factors can be assumed constant and need not be modelled. Many studies have examined the complex effects of the control period structure on model performance, but these discussions were simplified when case-crossover design was shown to be equivalent to various specifications of Poisson regression when exposure is considered constant across study participants. While reasonable for some applications, there are cases where such an assumption does not apply due to spatial variability in exposure, which may affect parameter estimation. This work presents a spatiotemporal model, which has temporal case-crossover and a geometrically aware spatial random effect based on the Hausdorff distance. The model construction incorporates a residual spatial structure in cases when the constant assumption exposure is not reasonable and when spatial regions are irregular.Item A statistical model for removing inter-device differences in spectroscopy(Optical Society of America, 2014) Wang, Lu; Lee, Jong Soo; Lane, Pierre; Atkinson, E. Neely; Zuluaga, Andres; Follen, Michele; MacAulay, Calum; Cox, Dennis D.We are investigating spectroscopic devices designed to make in vivo cervical tissue measurements to detect pre-cancerous and cancerous lesions. All devices have the same design and ideally should record identical measurements. However, we observed consistent differences among them. An experiment was designed to study the sources of variation in the measurements recorded. Here we present a log additive statistical model that incorporates the sources of variability we identified. Based on this model, we estimated correction factors from the experimental data needed to eliminate the inter-device variability and other sources of variation. These correction factors are intended to improve the accuracy and repeatability of such devices when making future measurements on patient tissue.Item Accuracy of optical spectroscopy for the detection of cervical intraepithelial neoplasia without colposcopic tissue information; a step toward automation for low resource settings(Society of Photo-Optical Instrumentation Engineers, 2012-04) Yamal, Jose-Miguel; Zewdie, Getie A.; Cox, Dennis D.; Atkinson, E. Neely; Cantor, Scott B.; MacAulay, Calum; Davies, Kalatu; Adewole, Isaac; Buys, Timon P. H.; Follen, MicheleOptical spectroscopy has been proposed as an accurate and low-cost alternative for detection of cervical intraepithelial neoplasia. We previously published an algorithm using optical spectroscopy as an adjunct to colposcopy and found good accuracy (sensitivity ¼ 1.00 [95% confidence interval ðCIÞ ¼ 0.92 to 1.00], specificity ¼ 0.71 [95% CI ¼ 0.62 to 0.79]). Those results used measurements taken by expert colposcopists as well as the colposcopy diagnosis. In this study, we trained and tested an algorithm for the detection of cervical intraepithelial neoplasia (i.e., identifying those patients who had histology reading CIN 2 or worse) that did not include the colposcopic diagnosis. Furthermore, we explored the interaction between spectroscopy and colposcopy, examining the importance of probe placement expertise. The colposcopic diagnosis-independent spectroscopy algorithm had a sensitivity of 0.98 (95% CI ¼ 0.89 to 1.00) and a specificity of 0.62 (95% CI ¼ 0.52 to 0.71). The difference in the partial area under the ROC curves between spectroscopy with and without the colposcopic diagnosis was statistically significant at the patient level (p ¼ 0.05) but not the site level (p ¼ 0.13). The results suggest that the device has high accuracy over a wide range of provider accuracy and hence could plausibly be implemented by providers with limited training.Item Adverse Health Outcomes Following Hurricane Harvey: A Comparison of Remotely-Sensed and Self-Reported Flood Exposure Estimates(Wiley, 2023) Ramesh, Balaji; Callender, Rashida; Zaitchik, Benjamin F.; Jagger, Meredith; Swarup, Samarth; Gohlke, Julia M.Remotely sensed inundation may help to rapidly identify areas in need of aid during and following floods. Here we evaluate the utility of daily remotely sensed flood inundation measures and estimate their congruence with self-reported home flooding and health outcomes collected via the Texas Flood Registry (TFR) following Hurricane Harvey. Daily flood inundation for 14 days following the landfall of Hurricane Harvey was acquired from FloodScan. Flood exposure, including number of days flooded and flood depth was assigned to geocoded home addresses of TFR respondents (N = 18,920 from 47 counties). Discordance between remotely-sensed flooding and self-reported home flooding was measured. Modified Poisson regression models were implemented to estimate risk ratios (RRs) for adverse health outcomes following flood exposure, controlling for potential individual level confounders. Respondents whose home was in a flooded area based on remotely-sensed data were more likely to report injury (RR = 1.5, 95% CI: 1.27–1.77), concentration problems (1.36, 95% CI: 1.25–1.49), skin rash (1.31, 95% CI: 1.15–1.48), illness (1.29, 95% CI: 1.17–1.43), headaches (1.09, 95% CI: 1.03–1.16), and runny nose (1.07, 95% CI: 1.03–1.11) compared to respondents whose home was not flooded. Effect sizes were larger when exposure was estimated using respondent-reported home flooding. Near-real time remote sensing-based flood products may help to prioritize areas in need of assistance when on the ground measures are not accessible.Item AIDS: The mismanagement of an epidemic(Maxwell Pergamon Macmillan, 1989) Thompson, J.R.An argument is made that, so far from being a disease which is unstoppable in its epidemic consequences, AIDS has produced an epidemic, which owes its present virulence to sociological configurations of rather recent existence. Instead of a vigorous attack on the transmission chain of the epidemic, the emphasis of public health policy has been on finding a vaccine and/or a cure of the disease which produces the epidemic. By means of a simple model, it is argued that by simply closing businesses catering to high contact rate anal sex, e.g. sexually oriented bathhouses, the American public health establishment might have avoided most of the tragic consequences of the present epidemic.Item An Alternative Approach for Estimating the Accuracy of Colposcopy in Detecting Cervical Precancer(Public Library of Science, 2015) Davies, Kalatu R.; Cantor, Scott B.; Cox, Dennis D.; Follen, MicheleIntroduction: Since colposcopy helps to detect cervical cancer in its precancerous stages, as new strategies and technologies are developed for the clinical management of cervical neoplasia, precisely determining the accuracy of colposcopy is important for characterizing its continued role. Our objective was to employ a more precise methodology to estimate of the accuracy of colposcopy to better reflect clinical practice. Study design: For each patient, we compared the worst histology result among colposcopically positive sites to the worst histology result among all sites biopsied, thereby more accurately determining the number of patients that would have been underdiagnosed by colposcopy than previously estimated. Materials and Methods: We utilized data from a clinical trial in which 850 diagnostic patients had been enrolled. Seven hundred and ninety-eight of the 850 patients had been examined by colposcopy, and biopsy samples were taken at colposcopically normal and abnormal sites. Our endpoints of interest were the percentages of patients underdiagnosed, and sensitivity and specificity of colposcopy. Results: With the threshold of low-grade squamous intraepithelial lesions for positive colposcopy and histology diagnoses, the sensitivity of colposcopy decreased from our previous assessment of 87.0% to 74.0%, while specificity remained the same. The drop in sensitivity was the result of histologically positive sites that were diagnosed as negative by colposcopy. Thus, 28.4% of the 798 patients in this diagnostic group would have had their condition underdiagnosed by colposcopy in the clinic. Conclusions: In utilizing biopsies at multiple sites of the cervix, we present a more precise methodology for determining the accuracy of colposcopy. The true accuracy of colposcopy is lower than previously estimated. Nevertheless, our results reinforce previous conclusions that colposcopy has an important role in the diagnosis of cervical precancer.Item Antagonism between viral infection and innate immunity at the single-cell level(Public Library of Science, 2023) Grabowski, Frederic; Kochańczyk, Marek; Korwek, Zbigniew; Czerkies, Maciej; Prus, Wiktor; Lipniacki, TomaszWhen infected with a virus, cells may secrete interferons (IFNs) that prompt nearby cells to prepare for upcoming infection. Reciprocally, viral proteins often interfere with IFN synthesis and IFN-induced signaling. We modeled the crosstalk between the propagating virus and the innate immune response using an agent-based stochastic approach. By analyzing immunofluorescence microscopy images we observed that the mutual antagonism between the respiratory syncytial virus (RSV) and infected A549 cells leads to dichotomous responses at the single-cell level and complex spatial patterns of cell signaling states. Our analysis indicates that RSV blocks innate responses at three levels: by inhibition of IRF3 activation, inhibition of IFN synthesis, and inhibition of STAT1/2 activation. In turn, proteins coded by IFN-stimulated (STAT1/2-activated) genes inhibit the synthesis of viral RNA and viral proteins. The striking consequence of these inhibitions is a lack of coincidence of viral proteins and IFN expression within single cells. The model enables investigation of the impact of immunostimulatory defective viral particles and signaling network perturbations that could potentially facilitate containment or clearance of the viral infection.Item Associations Between Residential Proximity to Traffic and Vascular Disease in a Cardiac Catheterization Cohort(American Heart Association, Inc, 2018) Ward-Caviness, Cavin K.; Kraus, William E.; Blach, Colette; Haynes, Carol S.; Dowdy, Elaine; Miranda, Marie Lynn; Devlin, Robert; Diaz-Sanchez, David; Cascio, Wayne E.; Mukerjee, Shaibal; Stallings, Casson; Smith, Luther A.; Gregory, Simon G.; Shah, Svati H.; Neas, Lucas M.; Hauser, Elizabeth R.Objective—Exposure to mobile source emissions is nearly ubiquitous in developed nations and is associated with multiple adverse health outcomes. There is an ongoing need to understand the specificity of traffic exposure associations with vascular outcomes, particularly in individuals with cardiovascular disease. Approach and Results—We performed a cross-sectional study using 2124 individuals residing in North Carolina, United States, who received a cardiac catheterization at the Duke University Medical Center. Traffic-related exposure was assessed via 2 metrics: (1) the distance between the primary residence and the nearest major roadway; and (2) location of the primary residence in regions defined based on local traffic patterns. We examined 4 cardiovascular disease outcomes: hypertension, peripheral arterial disease, the number of diseased coronary vessels, and recent myocardial infarction. Statistical models were adjusted for race, sex, smoking, type 2 diabetes mellitus, body mass index, hyperlipidemia, and home value. Results are expressed in terms of the odds ratio (OR). A 23% decrease in residential distance to major roadways was associated with higher prevalence of peripheral arterial disease (OR=1.29; 95% confidence interval, 1.08–1.55) and hypertension (OR=1.15; 95% confidence interval, 1.01–1.31). Associations with peripheral arterial disease were strongest in men (OR=1.42; 95% confidence interval, 1.17–1.74) while associations with hypertension were strongest in women (OR=1.21; 95% confidence interval, 0.99–1.49). Neither myocardial infarction nor the number of diseased coronary vessels were associated with traffic exposure. Conclusions—Traffic-related exposure is associated with peripheral arterial disease and hypertension while no associations are observed for 2 coronary-specific vascular outcomes.Item An automated respiratory data pipeline for waveform characteristic analysis(Wiley, 2023) Lusk, Savannah; Ward, Christopher S.; Chang, Andersen; Twitchell-Heyne, Avery; Fattig, Shaun; Allen, Genevera; Jankowsky, Joanna L.; Ray, Russell S.Comprehensive and accurate analysis of respiratory and metabolic data is crucial to modelling congenital, pathogenic and degenerative diseases converging on autonomic control failure. A lack of tools for high-throughput analysis of respiratory datasets remains a major challenge. We present Breathe Easy, a novel open-source pipeline for processing raw recordings and associated metadata into operative outcomes, publication-worthy graphs and robust statistical analyses including QQ and residual plots for assumption queries and data transformations. This pipeline uses a facile graphical user interface for uploading data files, setting waveform feature thresholds and defining experimental variables. Breathe Easy was validated against manual selection by experts, which represents the current standard in the field. We demonstrate Breathe Easy's utility by examining a 2-year longitudinal study of an Alzheimer's disease mouse model to assess contributions of forebrain pathology in disordered breathing. Whole body plethysmography has become an important experimental outcome measure for a variety of diseases with primary and secondary respiratory indications. Respiratory dysfunction, while not an initial symptom in many of these disorders, often drives disability or death in patient outcomes. Breathe Easy provides an open-source respiratory analysis tool for all respiratory datasets and represents a necessary improvement upon current analytical methods in the field. Key points Respiratory dysfunction is a common endpoint for disability and mortality in many disorders throughout life. Whole body plethysmography in rodents represents a high face-value method for measuring respiratory outcomes in rodent models of these diseases and disorders. Analysis of key respiratory variables remains hindered by manual annotation and analysis that leads to low throughput results that often exclude a majority of the recorded data. Here we present a software suite, Breathe Easy, that automates the process of data selection from raw recordings derived from plethysmography experiments and the analysis of these data into operative outcomes and publication-worthy graphs with statistics. We validate Breathe Easy with a terabyte-scale Alzheimer's dataset that examines the effects of forebrain pathology on respiratory function over 2 years of degeneration.Item Bayes goes fast: Uncertainty quantification for a covariant energy density functional emulated by the reduced basis method(Frontiers Media S.A., 2023) Giuliani, Pablo; Godbey, Kyle; Bonilla, Edgard; Viens, Frederi; Piekarewicz, JorgeA covariant energy density functional is calibrated using a principled Bayesian statistical framework informed by experimental binding energies and charge radii of several magic and semi-magic nuclei. The Bayesian sampling required for the calibration is enabled by the emulation of the high-fidelity model through the implementation of a reduced basis method (RBM)—a set of dimensionality reduction techniques that can speed up demanding calculations involving partial differential equations by several orders of magnitude. The RBM emulator we build—using only 100 evaluations of the high-fidelity model—is able to accurately reproduce the model calculations in tens of milliseconds on a personal computer, an increase in speed of nearly a factor of 3,300 when compared to the original solver. Besides the analysis of the posterior distribution of parameters, we present model calculations for masses and radii with properly estimated uncertainties. We also analyze the model correlation between the slope of the symmetry energy L and the neutron skin of 48Ca and 208Pb. The straightforward implementation and outstanding performance of the RBM makes it an ideal tool for assisting the nuclear theory community in providing reliable estimates with properly quantified uncertainties of physical observables. Such uncertainty quantification tools will become essential given the expected abundance of data from the recently inaugurated and future experimental and observational facilities.Item BayesBD: An R Package for Bayesian Inference on Image Boundaries(The R Foundation, 2017) Syring, Nicholas; Li, MengWe present the BayesBD package providing Bayesian inference for boundaries of noisy images. The BayesBD package implements flexible Gaussian process priors indexed by the circle to recover the boundary in a binary or Gaussian noised image. The boundary recovered by BayesBD has the practical advantages of guaranteed geometric restrictions and convenient joint inferences under certain assumptions, in addition to its desirable theoretical property of achieving (nearly) minimax optimal rate in a way that is adaptive to the unknown smoothness. The core sampling tasks for our model have linear complexity, and are implemented in C++ for computational efficiency using packages Rcpp and RcppArmadillo. Users can access the full functionality of the package in both the command line and the corresponding shiny application. Additionally, the package includes numerous utility functions to aid users in data preparation and analysis of results. We compare BayesBD with selected existing packages using both simulations and real data applications, demonstrating the excellent performance and flexibility of BayesBD even when the observation contains complicated structural information that may violate its assumptions.Item Bayesian data synthesis and the utility-risk trade-off for mixed epidemiological data(Project Euclid, 2022) Feldman, Joseph; Kowal, Daniel R.Much of the microdata used for epidemiological studies contain sensitive measurements on real individuals. As a result, such microdata cannot be published out of privacy concerns, and without public access to these data, any statistical analyses originally published on them are nearly impossible to reproduce. To promote the dissemination of key datasets for analysis without jeopardizing the privacy of individuals, we introduce a cohesive Bayesian framework for the generation of fully synthetic high-dimensional microdatasets of mixed categorical, binary, count, and continuous variables. This process centers around a joint Bayesian model that is simultaneously compatible with all of these data types, enabling the creation of mixed synthetic datasets through posterior predictive sampling. Furthermore, a focal point of epidemiological data analysis is the study of conditional relationships between various exposures and key outcome variables through regression analysis. We design a modified data synthesis strategy to target and preserve these conditional relationships, including both nonlinearities and interactions. The proposed techniques are deployed to create a synthetic version of a confidential dataset containing dozens of health, cognitive, and social measurements on nearly 20,000 North Carolina children.