Probabilistic Models for Genetic and Genomic Data with Missing Information

Hicks, Stephanie

Probabilistic Models for Genetic and Genomic Data with Missing Information

dc.contributor.advisor	Kimmel, Marek	en_US
dc.contributor.committeeMember	Thompson, James R.	en_US
dc.contributor.committeeMember	Nakhleh, Luay K.	en_US
dc.contributor.committeeMember	Plon, Sharon E.	en_US
dc.creator	Hicks, Stephanie	en_US
dc.date.accessioned	2013-09-16T15:13:21Z	en_US
dc.date.accessioned	2013-09-16T15:13:33Z	en_US
dc.date.available	2013-09-16T15:13:21Z	en_US
dc.date.available	2013-09-16T15:13:33Z	en_US
dc.date.created	2013-05	en_US
dc.date.issued	2013-09-16	en_US
dc.date.submitted	May 2013	en_US
dc.date.updated	2013-09-16T15:13:33Z	en_US
dc.description.abstract	Genetic and genomic data often contain unobservable or missing information. Applications of probabilistic models such as mixture models and hidden Markov models (HMMs) have been widely used since the 1960s to make inference on unobserved information using some observed information demonstrating the versatility and importance of these models. Biological applications of mixture models include gene expression data, meta-analysis, disease mapping, epidemiology and pharmacology and applications of HMMs include gene finding, linkage analysis, phylogenetic analysis and identifying regions of identity-by-descent. An important statistical and informatics challenge posed by modern genetics is to understand the functional consequences of genetic variation and its relation to phenotypic variation. In the analysis of whole-exome sequencing data, predicting the impact of missense mutations on protein function is an important factor in identifying and determining the clinical importance of disease susceptibility mutations in the absence of independent data determining impact on disease. In addition to the interpretation, identifying co-inherited regions of related individuals with Mendelian disorders can further narrow the search for disease susceptibility mutations. In this thesis, we develop two probabilistic models in application of genetic and genomic data with missing information: 1) a mixture model to estimate a posterior probability of functionality of missense mutations and 2) a HMM to identify co-inherited regions in the exomes of related individuals. The first application combines functional predictions from available computational or {\it in silico} methods which often have a high degree of disagreement leading to conflicting results for the user to assess the pathogenic impact of missense mutations on protein function. The second application considers extensions of a first-order HMM to include conditional emission probabilities varying as a function of minor allele frequency and a second-order dependence structure between observed variant calls. We apply these models to whole-exome sequencing data and show how these models can be used to identify disease susceptibility mutations. As disease-gene identification projects increasingly use next-generation sequencing, the probabilistic models developed in this thesis help identify and associate relevant disease-causing mutations with human disorders. The purpose of this thesis is to demonstrate that probabilistic models can contribute to more accurate and dependable inference based on genetic and genomic data with missing information.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Hicks, Stephanie. "Probabilistic Models for Genetic and Genomic Data with Missing Information." (2013) Diss., Rice University. <a href="https://hdl.handle.net/1911/71965">https://hdl.handle.net/1911/71965</a>.	en_US
dc.identifier.slug	123456789/ETD-2013-05-506	en_US
dc.identifier.uri	https://hdl.handle.net/1911/71965	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Statistics	en_US
dc.subject	Statistical genomics	en_US
dc.subject	Bioinformatics	en_US
dc.subject	Mixture models	en_US
dc.subject	Hidden Markov models	en_US
dc.title	Probabilistic Models for Genetic and Genomic Data with Missing Information	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Statistics	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: HICKS-THESIS.pdf
Size:: 14.83 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Rice University Theses and Dissertations
Test Environmental Research Collection