Browsing by Author "Guindani, Michele"
Now showing 1 - 20 of 21
Results Per Page
Sort Options
Item A Bayesian Integrative Model for Genetical Genomics with Spatially Informed Variable Selection(Libertas Academica, 2014) Cassese, Alberto; Guindani, Michele; Vannucci, MarinaWe consider a Bayesian hierarchical model for the integration of gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. The approach defines a measurement error model that relates the gene expression levels to latent copy number states. In turn, the latent states are related to the observed surrogate CGH measurements via a hidden Markov model. The model further incorpo-rates variable selection with a spatial prior based on a probit link that exploits dependencies across adjacent DNA segments. Posterior inference is carried out via Markov chain Monte Carlo stochastic search techniques. We study the performance of the model in simulations and show better results than those achieved with recently proposed alternative priors. We also show an application to data from a genomic study on lung squamous cell carcinoma, where we identify potential candidates of associations between copy number variants and the transcriptional activity of target genes. Gene ontology (GO) analyses of our findings reveal enrichments in genes that code for proteins involved in cancer. Our model also identifies a number of potential candidate biomarkers for further experimental validation.Item A Bayesian Nonparametric Approach for Functional Data Classification with Application to Hepatic Tissue Characterization(Libertas Academica, 2015) Fronczyk, Kassandra M.; Guindani, Michele; Hobbs, Brian P.; Ng, Chaan S.; Vannucci, MarinaComputed tomography perfusion (CTp) is an emerging functional imaging technology that provides a quantitative assessment of the passage of fluid through blood vessels. Tissue perfusion plays a critical role in oncology due to the proliferation of networks of new blood vessels typical of cancer angiogenesis, which triggers modifications to the vasculature of the surrounding host tissue. In this article, we consider a Bayesian semiparametric model for the analysis of functional data. This method is applied to a study of four interdependent hepatic perfusion CT characteristics that were acquired under the administration of contrast using a sequence of repeated scans over a period of 590 seconds. More specifically, our modeling framework facilitates borrowing of information across patients and tissues. Additionally, the approach enables flexible estimation of temporal correlation structures exhibited by mappings of the correlated perfusion biomarkers and thus accounts for the heteroskedasticity typically observed in those measurements, by incorporating change-points in the covariance estimation. This method is applied to measurements obtained from regions of liver surrounding malignant and benign tissues, for each perfusion biomarker. We demonstrate how to cluster the liver regions on the basis of their CTp profiles, which can be used in a prediction context to classify regions of interest provided by future patients, and thereby assist in discriminating malignant from healthy tissue regions in diagnostic settings.Item A Bayesian Nonparametric Spiked Process Prior for Dynamic Model Selection(Project Euclid, 2019) Cassese, Alberto; Zhu, Weixuan; Guindani, Michele; Vannucci, MarinaIn many applications, investigators monitor processes that vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or “normal” behavior. In this manuscript, we consider the monitoring of pneumonia and influenza (P&I) mortality, to detect influenza outbreaks in the continental United States, and propose a Bayesian nonparametric model selection approach to take into account the spatio-temporal dependence of outbreaks. More specifically, we introduce a zero-inflated conditionally identically distributed species sampling prior which allows borrowing information across time and to assign data to clusters associated to either a null or an alternate process. Spatial dependences are accounted for by means of a Markov random field prior, which allows to inform the selection based on inferences conducted at nearby locations. We show how the proposed modeling framework performs in an application to the P&I mortality data and in a simulation study, and compare with common threshold methods for detecting outbreaks over time, with more recent Markov switching based models, and with spike-and-slab Bayesian nonparametric priors that do not take into account spatio-temporal dependence.Item A Hierarchical Bayesian Model for the Identification of PET Markers Associated to the Prediction of Surgical Outcome after Anterior Temporal Lobe Resection(Frontiers Media S.A., 2017) Chiang, Sharon; Guindani, Michele; Yeh, Hsiang J.; Dewar, Sandra; Haneef, Zulfi; Stern, John M.; Vannucci, MarinaWe develop an integrative Bayesian predictive modeling framework that identifies individual pathological brain states based on the selection of fluoro-deoxyglucose positron emission tomography (PET) imaging biomarkers and evaluates the association of those states with a clinical outcome. We consider data from a study on temporal lobe epilepsy (TLE) patients who subsequently underwent anterior temporal lobe resection. Our modeling framework looks at the observed profiles of regional glucose metabolism in PET as the phenotypic manifestation of a latent individual pathologic state, which is assumed to vary across the population. The modeling strategy we adopt allows the identification of patient subgroups characterized by latent pathologies differentially associated to the clinical outcome of interest. It also identifies imaging biomarkers characterizing the pathological states of the subjects. In the data application, we identify a subgroup of TLE patients at high risk for post-surgical seizure recurrence after anterior temporal lobe resection, together with a set of discriminatory brain regions that can be used to distinguish the latent subgroups. We show that the proposed method achieves high cross-validated accuracy in predicting post-surgical seizure recurrence.Item An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data(BioMed Central, 2017) Wadsworth, W. Duncan; Argiento, Raffaele; Guindani, Michele; Galloway-Pena, Jessica; Shelbourne, Samuel A.; Vannucci, MarinaAbstract Background The Human Microbiome has been variously associated with the immune-regulatory mechanisms involved in the prevention or development of many non-infectious human diseases such as autoimmunity, allergy and cancer. Integrative approaches which aim at associating the composition of the human microbiome with other available information, such as clinical covariates and environmental predictors, are paramount to develop a more complete understanding of the role of microbiome in disease development. Results In this manuscript, we propose a Bayesian Dirichlet-Multinomial regression model which uses spike-and-slab priors for the selection of significant associations between a set of available covariates and taxa from a microbiome abundance table. The approach allows straightforward incorporation of the covariates through a log-linear regression parametrization of the parameters of the Dirichlet-Multinomial likelihood. Inference is conducted through a Markov Chain Monte Carlo algorithm, and selection of the significant covariates is based upon the assessment of posterior probabilities of inclusions and the thresholding of the Bayesian false discovery rate. We design a simulation study to evaluate the performance of the proposed method, and then apply our model on a publicly available dataset obtained from the Human Microbiome Project which associates taxa abundances with KEGG orthology pathways. The method is implemented in specifically developed R code, which has been made publicly available. Conclusions Our method compares favorably in simulations to several recently proposed approaches for similarly structured data, in terms of increased accuracy and reduced false positive as well as false negative rates. In the application to the data from the Human Microbiome Project, a close evaluation of the biological significance of our findings confirms existing associations in the literature.Item A Bayesian hierarchical model for maximizing the vascular adhesion of nanoparticles(Springer, 2014) Fronczyk, Kassandra; Guindani, Michele; Vannucci, Marina; Palange, Annalisa; Decuzzi, PaoloThe complex vascular dynamics and wall deposition of systemically injected nanoparticles is regulated by their geometrical properties (size, shape) and biophysical parameters (ligand–receptor bond type and surface density, local shear rates). Although sophisticated computational models have been developed to capture the vascular behavior of nanoparticles, it is increasingly recognized that purely deterministic approaches, where the governing parameters are known a priori and conclusively describe behaviors based on physical characteristics, may be too restrictive to accurately reflect natural processes. Here, a novel computational framework is proposed by coupling the physics dictating the vascular adhesion of nanoparticles with a stochastic model. In particular, two governing parameters (i.e. the ligand–receptor bond length and the ligand surface density on the nanoparticle) are treated as two stochastic quantities, whose values are not fixed a priori but would rather range in defined intervals with a certain probability. This approach is used to predict the deposition of spherical nanoparticles with different radii, ranging from 750 to 6,000 nm, in a parallel plate flow chamber under different flow conditions, with a shear rate ranging from 50 to 90 s−1 . It is demonstrated that the resulting stochastic model can predict the experimental data more accurately than the original deterministic model. This approach allows one to increase the predictive power of mathematical models of any natural process by accounting for the experimental and intrinsic biological uncertainties.Item Bayesian Joint Graphical Modeling Approaches for Covariance and Dynamic Functional Connectivity Analysis from Neuro-Imaging Data(2018-11-27) Warnick, Ryan Scott; Vannucci, Marina; Guindani, MicheleRecently, a developing interest has coincided between connectivity analysis in the neuroscience literature, and graphical modeling approaches in the statistical literature. The emergence of interest in these two domains has developed into a collaborative effect; an effort which tries to leverage the statistical methodologies used to estimate graphical models, and map them to analogous brain networks estimated from neuroscience data. The types of connectivity inferred between brain regions vary in their physiological and statistical interpretations, but in the functional MRI (fMRI) literature, two areas have emerged. In this thesis we focus on "functional" connectivity: a type of connectivity defined by interpreting networks as brain regions which covary in a similar fashion over some period of time. A refinement of this interpretation exists by instead examining brain regions which have a specified conditional independence structure over a period of time, and through this interpretation of functional connectivity approaches for graphical modeling can be applied. In this thesis we estimate temporal blocks with similar conditional independence behavior through the framework of Gaussian Graphical Models, and from these data jointly estimate the functional networks by using recently developed methods for joint graphical model estimation.Item Bayesian Methods for the Analysis of Microbiome Data(2016-11-30) Wadsworth, W Duncan; Guindani, Michele; Vannucci, MarinaBacteria, archaea, viruses, and fungi are present in large numbers both on and inside of our bodies. On average, only one in ten of “our” cells contain human DNA. The other 90% belong to a tremendous diversity of microbes, some of which are fundamentally related to health and disease mechanisms as documented in numerous recent biomedical studies (Turnbaugh et al., 2009; The Human Microbiome Project, 2012b; Knights et al., 2013; Arpaia et al., 2013; Pickard et al., 2014). Some of these microbes are beneficial while others are detrimental, and, since their abundances are poorly understood, identifying microbes associated with interesting phenotypes is of great importance. However, due to the complexity of these systems and certain characteristics of the data there are still limited numbers of appropriate statistical tools available for such a task. In this research work I will describe the basic features of microbiome abundance data and present two new modeling approaches that can be used to address some of the challenges presented by this data type. The first approach accomplishes a data integration and model selection goal by associating covariates with microbiome data. The second provides a method of correcting for multiple hypotheses as is common when testing for differential species abundance between experimental or observational conditions. We illustrate the performances of both methods in simulation studies, and in applications to freely available datasets. Finally, we further discuss their potential in microbiome research and possible future extensions.Item A Bayesian model for the identification of differentially expressed genes in Daphnia magna exposed to munition pollutants(Wiley, 2015) Cassese, Alberto; Guindani, Michele; Antczak, Philipp; Falciani, Francesco; Vannucci, MarinaIn this article we propose a Bayesian hierarchical model for the identification of differentially expressed genes in Daphnia magna organisms exposed to chemical compounds, specifically munition pollutants in water. The model we propose constitutes one of the very first attempts at a rigorous modeling of the biological effects of water purification. We have data acquired from a purification system that comprises four consecutive purification stages, which we refer to as "ponds," of progressively more contaminated water. We model the expected expression of a gene in a pond as the sum of the mean of the same gene in the previous pond plus a gene-pond specific difference. We incorporate a variable selection mechanism for the identification of the differential expressions, with a prior distribution on the probability of a change that accounts for the available information on the concentration of chemical compounds present in the water. We carry out posterior inference via MCMC stochastic search techniques. In the application, we reduce the complexity of the data by grouping genes according to their functional characteristics, based on the KEGG pathway database. This also increases the biological interpretability of the results. Our model successfully identifies a number of pathways that show differential expression between consecutive purification stages. We also find that changes in the transcriptional response are more strongly associated to the presence of certain compounds, with the remaining contributing to a lesser extent. We discuss the sensitivity of these results to the model parameters that measure the influence of the prior information on the posterior inference.Item Bayesian models for functional magnetic resonance imaging data analysis(Wiley, 2015) Zhang, Linlin; Guindani, Michele; Vannucci, MarinaFunctional magnetic resonance imaging (fMRI), a noninvasive neuroimaging method that provides an indirect measure of neuronal activity by detecting blood flow changes, has experienced an explosive growth in the past years. Statistical methods play a crucial role in understanding and analyzing fMRI data. Bayesian approaches, in particular, have shown great promise in applications. A remarkable feature of fully Bayesian approaches is that they allow a flexible modeling of spatial and temporal correlations in the data. This article provides a review of the most relevant models developed in recent years. We divide methods according to the objective of the analysis. We start from spatiotemporal models for fMRI data that detect task-related activation patterns. We then address the very important problem of estimating brain connectivity. We also touch upon methods that focus on making predictions of an individual's brain activity or a clinical or behavioral response. We conclude with a discussion of recent integrative models that aim at combining fMRI data with other imaging modalities, such as electroencephalography/magnetoencephalography (EEG/MEG) and diffusion tensor imaging (DTI) data, measured on the same subjects. We also briefly discuss the emerging field of imaging genetics.Item A Bayesian nonparametric approach for the analysis of multiple categorical item responses(Elsevier, 2015) Waters, Andrew; Fronczyk, Kassandra; Guindani, Michele; Baraniuk, Richard G.; Vannucci, MarinaWe develop a modeling framework for joint factor and cluster analysis of datasets where multiple categorical response items are collected on a heterogeneous population of individuals. We introduce a latent factor multinomial probit model and employ prior constructions that allow inference on the number of factors as well as clustering of the subjects into homogeneous groups according to their relevant factors. Clustering, in particular, allows us to borrow strength across subjects, therefore helping in the estimation of the model parameters, particularly when the number of observations is small. We employ Markov chain Monte Carlo techniques and obtain tractable posterior inference for our objectives, including sampling of missing data. We demonstrate the effectiveness of our method on simulated data. We also analyze two real-world educational datasets and show that our method outperforms state-of-the-art methods. In the analysis of the real-world data, we uncover hidden relationships between the questions and the underlying educational concepts, while simultaneously partitioning the students into groups of similar educational mastery.Item Bayesian nonparametric models for functional magnetic resonance imaging (fMRI) data(2015-04-24) Zhang, Linlin; Guindani, Michele; Vannucci, Marina; Schweinberger, Michael; Cox, StevenIn this research work, I propose Bayesian nonparametric approaches to model functional magnetic resonance imaging (fMRI) data. Due to the complex spatial and temporal correlation structure as well as the high dimensionality of fMRI data, statistical methods play a crucial role in the analysis of fMRI data. My research focuses on developing novel methods that incorporate both temporal and spatial correlations into a single modeling framework and simultaneously capture brain connectivity via appropriate priors. First, I propose a spatio-temporal nonparametric Bayesian variable selection model of single-subject fMRI data. The method provides a joint analytical framework that allows to detect activated brain regions in response to a stimulus and infer the clustering of spatially remote voxels that exhibit fMRI time series with similar characteristics. I show good performance of the model on inference through simulations, and demonstrate via synthetic data analysis that the model outperforms methods implemented in the SPM8, a standard software for fMRI data analysis. I also apply the model to a fMRI study on attention to visual motion, and illustrate the results of activation detection and clustering. Then I propose a Bayesian modeling approach to the analysis of multiple-subject fMRI data. The proposed method provides a unified, single stage, and probabilistically coherent Bayesian framework for the inference of task-related brain activity. Furthermore, I employ with advanced Bayesian nonparametric priors to tie the activation strengths within and across subjects, and graphical network priors to model the complex spatio-temporal correlation structure observed in fMRI scans from multiple subjects. I develop a variational Bayesian method for inference, in addition to a Markov Chain Monte Carlo (MCMC) method. I investigate the performance of the proposed model on simulated data, and compare its performance to competing methods on synthetic data. In an application to data from a fMRI study on breast cancer survivors, the model demonstrates the excellent estimation performance.Item Erratum to: An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data(BioMed Central, 2017) Wadsworth, W. D; Argiento, Raffaele; Guindani, Michele; Galloway-Pena, Jessica; Shelburne, Samuel A; Vannucci, MarinaItem Filtering and Estimation for a Class of Stochastic Volatility Models with Intractable Likelihoods(Project Euclid, 2019) Vankov, Emilian R.; Guindani, Michele; Ensor, Katherine B.We introduce a new approach to latent state filtering and parameter estimation for a class of stochastic volatility models (SVMs) for which the likelihood function is unknown. The α-stable stochastic volatility model provides a flexible framework for capturing asymmetry and heavy tails, which is useful when modeling financial returns. However, the α-stable distribution lacks a closed form for the probability density function, which prevents the direct application of standard Bayesian filtering and estimation techniques such as sequential Monte Carlo and Markov chain Monte Carlo. To obtain filtered volatility estimates, we develop a novel approximate Bayesian computation (ABC) based auxiliary particle filter, which provides improved performance through better proposal distributions. Further, we propose a new particle based MCMC (PMCMC) method for joint estimation of the parameters and latent volatility states. With respect to other extensions of PMCMC, we introduce an efficient single filter particle Metropolis-within-Gibbs algorithm which can be applied for obtaining inference on the parameters of an asymmetric α-stable stochastic volatility model. We show the increased efficiency in the estimation process through a simulation study. Finally, we highlight the necessity for modeling asymmetric α-stable SVMs through an application to propane weekly spot prices.Item A hierarchical Bayesian model for inference of copy number variants and their association to gene expression(Institute of Mathematical Statistics, 2014) Cassese, Alberto; Guindani, Michele; Tadesse, Mahlet G.; Falciani, Francesco; Vannucci, MarinaA number of statistical models have been successfully developed for the analysis of high-throughput data from a single source, but few methods are available for integrating data from different sources. Here we focus on integrating gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. We specify a measurement error model that relates the gene expression levels to latent copy number states which, in turn, are related to the observed surrogate CGH measurements via a hidden Markov model. We employ selection priors that exploit the dependencies across adjacent copy number states and investigate MCMC stochastic search techniques for posterior inference. Our approach results in a unified modeling framework for simultaneously inferring copy number variants (CNV) and identifying their significant associations with mRNA transcripts abundance. We show performance on simulated data and illustrate an application to data from a genomic study on human cancer cell lines.Item An Integrative Bayesian Modeling Approach to Imaging Genetics(Taylor & Francis, 2013) Stingo, Francesco C.; Guindani, Michele; Vannucci, Marina; Calhoun, Vince D.Item A Network Biology Approach Identifies Molecular Cross-Talk between Normal Prostate Epithelial and Prostate Carcinoma Cells(Public Library of Science, 2016) Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antzack, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J.; Guindani, Michele; Bicknell, Roy; Vannucci, Marina; Falciani, FrancescoThe advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication networks in a wide spectrum of biological systems.Item A spatio-temporal nonparametric Bayesian variable selection model of fMRI data for clustering correlated time courses(Elsevier, 2014) Zhang, Linlin; Guindani, Michele; Versace, Francesco; Vannucci, MarinaIn this paper we present a novel wavelet-based Bayesian nonparametric regression model for the analysis of functional magnetic resonance imaging (fMRI) data. Our goal is to provide a joint analytical framework that allows to detect regions of the brain which exhibit neuronal activity in response to a stimulus and, simultaneously, infer the association, or clustering, of spatially remote voxels that exhibit fMRI time series with similar characteristics. We start by modeling the data with a hemodynamic response function (HRF) with a voxel-dependent shape parameter. We detect regions of the brain activated in response to a given stimulus by using mixture priors with a spike at zero on the coefficients of the regression model. We account for the complex spatial correlation structure of the brain by using a Markov random field (MRF) prior on the parameters guiding the selection of the activated voxels, therefore capturing correlation among nearby voxels. In order to infer association of the voxel time courses, we assume correlated errors, in particular long memory, and exploit the whitening properties of discrete wavelet transforms. Furthermore, we achieve clustering of the voxels by imposing a Dirichlet process (DP) prior on the parameters of the long memory process. For inference, we use Markov Chain Monte Carlo (MCMC) sampling techniques that combine Metropolis–Hastings schemes employed in Bayesian variable selection with sampling algorithms for nonparametric DP models. We explore the performance of the proposed model on simulated data, with both block- and event-related design, and on real fMRI data.Item A spatiotemporal nonparametric Bayesian model of multi-subject fMRI data(Project Euclid, 2016) Zhang, Linlin; Guindani, Michele; Versace, Francesco; Engelmann, Jeffrey M.; Vannucci, MarinaIn this paper we propose a unified, probabilistically coherent framework for the analysis of task-related brain activity in multi-subject fMRI experiments. This is distinct from two-stage “group analysis” approaches traditionally considered in the fMRI literature, which separate the inference on the individual fMRI time courses from the inference at the population level. In our modeling approach we consider a spatiotemporal linear regression model and specifically account for the between-subjects heterogeneity in neuronal activity via a spatially informed multi-subject nonparametric variable selection prior. For posterior inference, in addition to Markov chain Monte Carlo sampling algorithms, we develop suitable variational Bayes algorithms. We show on simulated data that variational Bayes inference achieves satisfactory results at more reduced computational costs than using MCMC, allowing scalability of our methods. In an application to data collected to assess brain responses to emotional stimuli our method correctly detects activation in visual areas when visual stimuli are presented.Item Temporal and spectral characteristics of dynamic functional connectivity between resting-state networks reveal information beyond static connectivity(Public Library of Science, 2018) Chiang, Sharon; Vankov, Emilian R.; Yeh, Hsiang J.; Guindani, Michele; Vannucci, Marina; Haneef, Zulfi; Stern, John M.Estimation of functional connectivity (FC) has become an increasingly powerful tool for investigating healthy and abnormal brain function. Static connectivity, in particular, has played a large part in guiding conclusions from the majority of resting-state functional MRI studies. However, accumulating evidence points to the presence of temporal fluctuations in FC, leading to increasing interest in estimating FC as a dynamic quantity. One central issue that has arisen in this new view of connectivity is the dramatic increase in complexity caused by dynamic functional connectivity (dFC) estimation. To computationally handle this increased complexity, a limited set of dFC properties, primarily the mean and variance, have generally been considered. Additionally, it remains unclear how to integrate the increased information from dFC into pattern recognition techniques for subject-level prediction. In this study, we propose an approach to address these two issues based on a large number of previously unexplored temporal and spectral features of dynamic functional connectivity. A Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model is used to estimate time-varying patterns of functional connectivity between resting-state networks. Time-frequency analysis is then performed on dFC estimates, and a large number of previously unexplored temporal and spectral features drawn from signal processing literature are extracted for dFC estimates. We apply the investigated features to two neurologic populations of interest, healthy controls and patients with temporal lobe epilepsy, and show that the proposed approach leads to substantial increases in predictive performance compared to both traditional estimates of static connectivity as well as current approaches to dFC. Variable importance is assessed and shows that there are several quantities that can be extracted from dFC signal which are more informative than the traditional mean or variance of dFC. This work illuminates many previously unexplored facets of the dynamic properties of functional connectivity between resting-state networks, and provides a platform for dynamic functional connectivity analysis that facilitates its usage as an investigative measure for healthy as well as abnormal brain function.