Bayesian Models for High-Dimensional Count Data with Feature Selection

Li, Qiwei

Bayesian Models for High-Dimensional Count Data with Feature Selection

dc.contributor.advisor	Vannucci, Marina	en_US
dc.creator	Li, Qiwei	en_US
dc.date.accessioned	2017-08-01T15:24:30Z	en_US
dc.date.available	2017-08-01T15:24:30Z	en_US
dc.date.created	2016-12	en_US
dc.date.issued	2016-11-14	en_US
dc.date.submitted	December 2016	en_US
dc.date.updated	2017-08-01T15:24:30Z	en_US
dc.description.abstract	Modern big data analytics often involve large data sets in which the features of interest are measured as counts. My thesis considers the problem of modeling a high-dimensional matrix of count data and presents two novel Bayesian hierarchical frameworks, both of which incorporate a feature selection mechanism and account for the over-dispersion observed across samples as well as across features. For inference, I use Markov chain Monte Carlo (MCMC) sampling techniques with Metropolis-Hastings schemes employed in Bayesian feature selection. In the first project on Bayesian nonparametric inference, I propose a zero-inflated Poisson mixture model that incorporates model-based normalization through prior distributions with mean constraints. The model further allows us to cluster the samples into homogenous groups, defined by a Dirichlet process (DP) while selecting a parsimonious set of discriminatory features simultaneously. I show how my approach improves the accuracy of the clustering with respect to more standard approaches for the analysis of count data, by means of a simulation study and an application to a bag-of-words benchmark data set, where the features are represented by the frequencies of occurrence of each word. In the second project on Bayesian integrative analysis, I propose a negative binomial mixture regression model that integrates several characteristics. In addition to feature selection, the model includes Markov random field (MRF) prior models that capture structural dependencies among the features. The model further allows the mixture components to depend on a set of selected covariates. The simulation studies show that employing the MRF prior improves feature selection accuracy. The proposed approach is also illustrated through an application to RNA-Seq gene expression and DNA methylation data for identifying biomarkers in breast cancer.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Li, Qiwei. "Bayesian Models for High-Dimensional Count Data with Feature Selection." (2016) Diss., Rice University. <a href="https://hdl.handle.net/1911/95966">https://hdl.handle.net/1911/95966</a>.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/95966	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Statistics	en_US
dc.subject	Bayesian inference	en_US
dc.subject	High-dimensional data	en_US
dc.subject	Count data	en_US
dc.subject	Clustering	en_US
dc.subject	Feature selection	en_US
dc.subject	Regression	en_US
dc.subject	Integrative analysis	en_US
dc.subject	Bayesian nonparametric approaches	en_US
dc.subject	Dirichlet process	en_US
dc.subject	Markov chain Monte Carlos	en_US
dc.subject	Graphical network priors	en_US
dc.subject	Markov random field	en_US
dc.title	Bayesian Models for High-Dimensional Count Data with Feature Selection	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Statistics	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LI-DOCUMENT-2016.pdf
Size:: 3.81 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.6 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations