Browsing by Author "Schweinberger, Michael"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item Bayesian nonparametric models for functional magnetic resonance imaging (fMRI) data(2015-04-24) Zhang, Linlin; Guindani, Michele; Vannucci, Marina; Schweinberger, Michael; Cox, StevenIn this research work, I propose Bayesian nonparametric approaches to model functional magnetic resonance imaging (fMRI) data. Due to the complex spatial and temporal correlation structure as well as the high dimensionality of fMRI data, statistical methods play a crucial role in the analysis of fMRI data. My research focuses on developing novel methods that incorporate both temporal and spatial correlations into a single modeling framework and simultaneously capture brain connectivity via appropriate priors. First, I propose a spatio-temporal nonparametric Bayesian variable selection model of single-subject fMRI data. The method provides a joint analytical framework that allows to detect activated brain regions in response to a stimulus and infer the clustering of spatially remote voxels that exhibit fMRI time series with similar characteristics. I show good performance of the model on inference through simulations, and demonstrate via synthetic data analysis that the model outperforms methods implemented in the SPM8, a standard software for fMRI data analysis. I also apply the model to a fMRI study on attention to visual motion, and illustrate the results of activation detection and clustering. Then I propose a Bayesian modeling approach to the analysis of multiple-subject fMRI data. The proposed method provides a unified, single stage, and probabilistically coherent Bayesian framework for the inference of task-related brain activity. Furthermore, I employ with advanced Bayesian nonparametric priors to tie the activation strengths within and across subjects, and graphical network priors to model the complex spatio-temporal correlation structure observed in fMRI scans from multiple subjects. I develop a variational Bayesian method for inference, in addition to a Markov Chain Monte Carlo (MCMC) method. I investigate the performance of the proposed model on simulated data, and compare its performance to competing methods on synthetic data. In an application to data from a fMRI study on breast cancer survivors, the model demonstrates the excellent estimation performance.Item Consistent estimation of high-dimensional random graph models with dependent edge variables(2020-04-24) Stewart, Jonathan Roy; Schweinberger, MichaelAn important question in statistical network analysis is how to construct models of random graphs with dependent edges without sacrificing computational scalability and statistical guarantees. This thesis advances models, methods, and theory for dependent network data by introducing a simple and flexible approach to specifying random graph models that allow edges to be dependent and dependence among edges to propagate throughout the random graph. As examples, we develop generalizations of β-models with dependent edges capturing brokerage in networks. On the statistical side, we obtain the first consistency results and convergence rates for maximum likelihood in high-dimensional settings where a single observation of a network with dependent random variables is available and the number of parameters increases with network size. The theoretical results developed here are general and make weak assumptions, requiring nothing more than strictly positive distributions with exponential-family parameterizations, and may be of independent interest. We showcase consistency results and convergence rates in the special case of generalized β-models with dependent edges and parameter vectors of increasing dimension, and demonstrate through simulations that the statistical error is low even when the network has no more than 1,000 nodes. The thesis concludes with two applications, one involving social networks and the other one involving human brain networks.Item Disaster response on September 11, 2001 through the lens of statistical network analysis(Elsevier, 2014) Schweinberger, Michael; Petrescu-Prahova, Miruna; Vu, Duy QuangThe rescue and relief operations triggered by the September 11, 2001 attacks on the World Trade Center in New York City demanded collaboration among hundreds of organisations. To shed light on the response to the September 11, 2001 attacks and help to plan and prepare the response to future disasters, we study the inter-organisational network that emerged in response to the attacks. Studying the inter-organisational network can help to shed light on (1) whether some organisations dominated the inter-organisational network and facilitated communication and coordination of the disaster response; (2) whether the dominating organisations were supposed to coordinate disaster response or emerged as coordinators in the wake of the disaster; and (3) the degree of network redundancy and sensitivity of the inter-organisational network to disturbances following the initial disaster. We introduce a Bayesian framework which can answer the substantive questions of interest while being as simple and parsimonious as possible. The framework allows organisations to have varying propensities to collaborate, while taking covariates into account, and allows to assess whether the inter-organisational network had network redundancy—in the form of transitivity—by using a test which may be regarded as a Bayesian score test. We discuss implications in terms of disaster management.Item High-dimensional and dependent data with additional structure(2017-04-19) Babkin, Sergii; Schweinberger, MichaelThe age of computing has enabled the collection of massive amounts of data. These data present numerous statistical challenges, because many data sets are high-dimensional and dependent. While statistical inference for high-dimensional and dependent data is challenging, many data come with additional structure that can be exploited to facilitate statistical inference. This thesis considers two widely used classes of models for high-dimensional and dependent data with additional structure, high-dimensional multivariate time series and exponential-family random graph models. In the case of high-dimensional multivariate time series, there is often additional structure in the form of spatial structure, e.g., air pollution is monitored by monitors and the geographical locations of monitors are known. If air pollutants cannot travel long distances, then the estimation of past-present and present-present dependencies of air pollution at monitors can be restricted to short distances. Here, a novel two-step estimation approach is proposed to estimate the range of dependence along with the parameters of multivariate time series in high-dimensional settings. Theoretical results show that the two-step estimation approach reduces statistical error in high-dimensional settings. Simulation results confirm that the two-step estimation approach reduces statistical error and computing time. An application to air pollution in the U.S. demonstrates that the two-step estimation approach gives rise to results that are in line with scientific knowledge, whereas estimation approaches ignoring the spatial structure report results that are in conflict with scientific knowledge. In the case of exponential-family random graph models, it is likewise common that there is additional structure: e.g., it is known that many networks, such as insurgencies and terrorist networks, are local in nature. Here, a novel two-step estimation approach is proposed to estimate the local structure along with the dependence pattern of networks. The proposed two-step estimation approach can be implemented in parallel and hence paves the ground for massive-scale estimation of exponential-family random graph models. Theoretical results are provided along with simulation results. An application to a large Amazon product network demonstrates the usefulness of the proposed two-step estimation approach.Item High-Dimensional Multivariate Time Series With Additional Structure(Taylor & Francis, 2017) Schweinberger, Michael; Babkin, Sergii; Ensor, Katherine B.; Center for Computational Finance and Economic SystemsHigh-dimensional multivariate time series are challenging due to the dependent and high-dimensional nature of the data, but in many applications there is additional structure that can be exploited to reduce computing time along with statistical error. We consider high-dimensional vector autoregressive processes with spatial structure, a simple and common form of additional structure. We propose novel high-dimensional methods that take advantage of such structure without making model assumptions about how distance affects dependence. We provide nonasymptotic bounds on the statistical error of parameter estimators in high-dimensional settings and show that the proposed approach reduces the statistical error. An application to air pollution in the USA demonstrates that the estimation approach reduces both computing time and prediction error and gives rise to results that are meaningful from a scientific point of view, in contrast to high-dimensional methods that ignore spatial structure. In practice, these high-dimensional methods can be used to decompose high-dimensional multivariate time series into lower-dimensional multivariate time series that can be studied by other methods in more depth.Item Local dependence in random graph models: characterization, properties and statistical inference(Wiley, 2015) Schweinberger, Michael; Handcock, Mark S.Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’.Item Model-based clustering of large networks(Project Euclid, 2013) Vu, Duy Q.; Hunter, David R.; Schweinberger, MichaelWe describe a network clustering framework, based on finite mixture models, that can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other recent model-based clustering work for networks, we introduce a more flexible modeling framework, improve the variational-approximation estimation algorithm, discuss and implement standard error estimation via a parametric bootstrap approach, and apply these methods to much larger data sets than those seen elsewhere in the literature. The more flexible framework is achieved through introducing novel parameterizations of the model, giving varying degrees of parsimony, using exponential family models whose structure may be exploited in various theoretical and algorithmic ways. The algorithms are based on variational generalized EM algorithms, where the E-steps are augmented by a minorization-maximization (MM) idea. The bootstrapped standard error estimates are based on an efficient Monte Carlo network simulation idea. Last, we demonstrate the usefulness of the model-based clustering framework by applying it to a discrete-valued network with more than 131,000 nodes and 17 billion edge variables.Item Unknown Multilevel network data facilitate statistical inference for curved ERGMs with geometrically weighted terms(Elsevier, 2019) Stewart, Jonathan; Schweinberger, Michael; Bojanowski, Michal; Morris, MartinaMultilevel network data provide two important benefits for ERG modeling. First, they facilitate estimation of the decay parameters in geometrically weighted terms for degree and triad distributions. Estimating decay parameters from a single network is challenging, so in practice they are typically fixed rather than estimated. Multilevel network data overcome that challenge by leveraging replication. Second, such data make it possible to assess out-of-sample performance using traditional cross-validation techniques. We demonstrate these benefits by using a multilevel network sample of classroom networks from Poland. We show that estimating the decay parameters improves in-sample performance of the model and that the out-of-sample performance of our best model is strong, suggesting that our findings can be generalized to the population of interest.