Bayesian Models for High-Dimensional Count Data with Feature Selection

dc.contributor.advisorVannucci, Marinaen_US
dc.creatorLi, Qiweien_US
dc.date.accessioned2017-08-01T15:24:30Zen_US
dc.date.available2017-08-01T15:24:30Zen_US
dc.date.created2016-12en_US
dc.date.issued2016-11-14en_US
dc.date.submittedDecember 2016en_US
dc.date.updated2017-08-01T15:24:30Zen_US
dc.description.abstractModern big data analytics often involve large data sets in which the features of interest are measured as counts. My thesis considers the problem of modeling a high-dimensional matrix of count data and presents two novel Bayesian hierarchical frameworks, both of which incorporate a feature selection mechanism and account for the over-dispersion observed across samples as well as across features. For inference, I use Markov chain Monte Carlo (MCMC) sampling techniques with Metropolis-Hastings schemes employed in Bayesian feature selection. In the first project on Bayesian nonparametric inference, I propose a zero-inflated Poisson mixture model that incorporates model-based normalization through prior distributions with mean constraints. The model further allows us to cluster the samples into homogenous groups, defined by a Dirichlet process (DP) while selecting a parsimonious set of discriminatory features simultaneously. I show how my approach improves the accuracy of the clustering with respect to more standard approaches for the analysis of count data, by means of a simulation study and an application to a bag-of-words benchmark data set, where the features are represented by the frequencies of occurrence of each word. In the second project on Bayesian integrative analysis, I propose a negative binomial mixture regression model that integrates several characteristics. In addition to feature selection, the model includes Markov random field (MRF) prior models that capture structural dependencies among the features. The model further allows the mixture components to depend on a set of selected covariates. The simulation studies show that employing the MRF prior improves feature selection accuracy. The proposed approach is also illustrated through an application to RNA-Seq gene expression and DNA methylation data for identifying biomarkers in breast cancer.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationLi, Qiwei. "Bayesian Models for High-Dimensional Count Data with Feature Selection." (2016) Diss., Rice University. <a href="https://hdl.handle.net/1911/95966">https://hdl.handle.net/1911/95966</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/95966en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectStatisticsen_US
dc.subjectBayesian inferenceen_US
dc.subjectHigh-dimensional dataen_US
dc.subjectCount dataen_US
dc.subjectClusteringen_US
dc.subjectFeature selectionen_US
dc.subjectRegressionen_US
dc.subjectIntegrative analysisen_US
dc.subjectBayesian nonparametric approachesen_US
dc.subjectDirichlet processen_US
dc.subjectMarkov chain Monte Carlosen_US
dc.subjectGraphical network priorsen_US
dc.subjectMarkov random fielden_US
dc.titleBayesian Models for High-Dimensional Count Data with Feature Selectionen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentStatisticsen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LI-DOCUMENT-2016.pdf
Size:
3.81 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: