Probabilistic Models for Genetic and Genomic Data with Missing Information

dc.contributor.advisorKimmel, Mareken_US
dc.contributor.committeeMemberThompson, James R.en_US
dc.contributor.committeeMemberNakhleh, Luay K.en_US
dc.contributor.committeeMemberPlon, Sharon E.en_US
dc.creatorHicks, Stephanieen_US
dc.date.accessioned2013-09-16T15:13:21Zen_US
dc.date.accessioned2013-09-16T15:13:33Zen_US
dc.date.available2013-09-16T15:13:21Zen_US
dc.date.available2013-09-16T15:13:33Zen_US
dc.date.created2013-05en_US
dc.date.issued2013-09-16en_US
dc.date.submittedMay 2013en_US
dc.date.updated2013-09-16T15:13:33Zen_US
dc.description.abstractGenetic and genomic data often contain unobservable or missing information. Applications of probabilistic models such as mixture models and hidden Markov models (HMMs) have been widely used since the 1960s to make inference on unobserved information using some observed information demonstrating the versatility and importance of these models. Biological applications of mixture models include gene expression data, meta-analysis, disease mapping, epidemiology and pharmacology and applications of HMMs include gene finding, linkage analysis, phylogenetic analysis and identifying regions of identity-by-descent. An important statistical and informatics challenge posed by modern genetics is to understand the functional consequences of genetic variation and its relation to phenotypic variation. In the analysis of whole-exome sequencing data, predicting the impact of missense mutations on protein function is an important factor in identifying and determining the clinical importance of disease susceptibility mutations in the absence of independent data determining impact on disease. In addition to the interpretation, identifying co-inherited regions of related individuals with Mendelian disorders can further narrow the search for disease susceptibility mutations. In this thesis, we develop two probabilistic models in application of genetic and genomic data with missing information: 1) a mixture model to estimate a posterior probability of functionality of missense mutations and 2) a HMM to identify co-inherited regions in the exomes of related individuals. The first application combines functional predictions from available computational or {\it in silico} methods which often have a high degree of disagreement leading to conflicting results for the user to assess the pathogenic impact of missense mutations on protein function. The second application considers extensions of a first-order HMM to include conditional emission probabilities varying as a function of minor allele frequency and a second-order dependence structure between observed variant calls. We apply these models to whole-exome sequencing data and show how these models can be used to identify disease susceptibility mutations. As disease-gene identification projects increasingly use next-generation sequencing, the probabilistic models developed in this thesis help identify and associate relevant disease-causing mutations with human disorders. The purpose of this thesis is to demonstrate that probabilistic models can contribute to more accurate and dependable inference based on genetic and genomic data with missing information.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationHicks, Stephanie. "Probabilistic Models for Genetic and Genomic Data with Missing Information." (2013) Diss., Rice University. <a href="https://hdl.handle.net/1911/71965">https://hdl.handle.net/1911/71965</a>.en_US
dc.identifier.slug123456789/ETD-2013-05-506en_US
dc.identifier.urihttps://hdl.handle.net/1911/71965en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectStatisticsen_US
dc.subjectStatistical genomicsen_US
dc.subjectBioinformaticsen_US
dc.subjectMixture modelsen_US
dc.subjectHidden Markov modelsen_US
dc.titleProbabilistic Models for Genetic and Genomic Data with Missing Informationen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentStatisticsen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
HICKS-THESIS.pdf
Size:
14.83 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: