Statistical Methods for Elucidating Tumor Heterogeneity and Evolution from Single-cell DNA Sequencing Data

dc.contributor.advisorNakhleh, Luayen_US
dc.contributor.committeeMemberChen, Kenen_US
dc.creatorZafar, Hamimen_US
dc.date.accessioned2019-05-17T15:22:35Zen_US
dc.date.available2019-08-01T05:01:07Zen_US
dc.date.created2018-08en_US
dc.date.issued2018-08-08en_US
dc.date.submittedAugust 2018en_US
dc.date.updated2019-05-17T15:22:35Zen_US
dc.description.abstractIntra-tumor heterogeneity, as caused by a combination of mutation and selection, poses significant challenges to the diagnosis and clinical therapy of cancer. Resolving this heterogeneity to identify the tumor cell populations (clones) and delineate their evolutionary history is of critical importance in improving cancer diagnosis and therapy. This heterogeneity can be readily elucidated and understood through the reconstruction of the clonal genotypes and evolutionary history of the tumor cells. These tasks are challenging since genomic data is most often collected from one snapshot during the evolution of the tumor's constituent cells. Consequently, using computational methods that infer the tumor phylogeny and tumor subpopulations from sequence data is the approach of choice. Recently emerged single-cell DNA sequencing (SCS) technologies promise to resolve intra-tumor heterogeneity to a single-cell level. However, inherent technical errors in SCS datasets, including false-positive (FP) errors, false-negatives (FN) due to allelic dropout, cell doublets and coverage non-uniformity significantly complicate these tasks. In this thesis, we first develop a likelihood-based approach for inferring tumor trees from imperfect SCS genotype data with potentially missing entries, under a finite-sites model of evolution. Our model of evolution introduces a continuous time Markov chain that accounts for the effects of different events in tumor evolution including point mutations, loss of heterozygosity, deletion and recurrent mutations on genomic sites. Our method probabilistically accounts for false positive and false negative errors and missing entries in SCS datasets. With the help of a heuristic search algorithm, our method finds a maximum-likelihood solution for the phylogenetic tree that best describes the evolutionary history of the tumor cells in the SCS dataset. In doing so, our method also estimates the error rates associated with the datasets. Another contribution of this method is to infer the order of the mutations on the branches of the inferred tumor phylogeny. This is done using a maximum-likelihood-based dynamic programming algorithm. The performance of our method on synthetic and experimental datasets from two colorectal cancer patients to trace evolutionary lineages in primary and metastatic tumors suggests that employing a finite-sites model leads to an improved inference of tumor phylogenies. Secondly, we develop a non-parametric Bayesian method that simultaneously reconstructs the clonal populations as clusters of single cells, mutations associated with each clone, and the genealogical relationships between the clonal populations. It employs a tree-structured Chinese restaurant process as a prior on the number and composition of clonal populations. The evolution of the clonal populations is modeled by a clonal phylogeny and a finite-sites model of evolution to account for potential mutation recurrence and losses. We probabilistically account for FP and FN errors, and cell doublets are modeled by employing a Beta-binomial distribution. We develop a Gibbs sampling algorithm comprising of partial reversible-jump and partial Metropolis-Hastings updates to explore the joint posterior space of all parameters. The performance of our method on synthetic and experimental datasets suggests that joint reconstruction of tumor clones and clonal phylogeny under a finite-sites model of evolution leads to more accurate inferences. Our method is the first to enable this joint reconstruction in a fully Bayesian framework, thus providing measures of support of the inferences it makes.en_US
dc.embargo.terms2019-08-01en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationZafar, Hamim. "Statistical Methods for Elucidating Tumor Heterogeneity and Evolution from Single-cell DNA Sequencing Data." (2018) Diss., Rice University. <a href="https://hdl.handle.net/1911/105773">https://hdl.handle.net/1911/105773</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/105773en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectStatistical Learningen_US
dc.subjectProbabilistic Graphical Modelen_US
dc.subjectSingle-cell Sequencingen_US
dc.subjectTumor phylogenyen_US
dc.subjectIntratumor Heterogeneityen_US
dc.titleStatistical Methods for Elucidating Tumor Heterogeneity and Evolution from Single-cell DNA Sequencing Dataen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZAFAR-DOCUMENT-2018.pdf
Size:
7.05 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: