Simultaneous SNV calling and Phylogenetic Inference for Single-cell Sequencing Data

dc.contributor.advisorNakhleh , Luayen_US
dc.contributor.committeeMemberShrivastava, Anshumalien_US
dc.contributor.committeeMemberTreangen, Todden_US
dc.contributor.committeeMemberZafar, Hamimen_US
dc.creatorEdrisi, Mohammadaminen_US
dc.date.accessioned2020-11-24T14:28:59Zen_US
dc.date.available2020-11-24T14:28:59Zen_US
dc.date.created2020-12en_US
dc.date.issued2020-11-06en_US
dc.date.submittedDecember 2020en_US
dc.date.updated2020-11-24T14:28:59Zen_US
dc.description.abstractSingle-cell sequencing provides a powerful approach for elucidating intratumor heterogeneity by resolving cell-to-cell variability. However, it also poses additional challenges including elevated error rates, allelic dropout, and non-uniform coverage. Variant calling in this context is the task of identifying mutations in the genomes of individual cells while accounting for the multiple types of errors. One powerful approach for solving this task computationally is to rely on a phylogenetic context, since the genomes under analysis evolved from a common ancestor along the branches of a tree. The phylogenetic tree captures the temporal dependencies across the genomes and provides an important constraint that allows to distinguish true mutations from error that masquerades as mutation. However, this approach of simultaneously identifying mutations while accounting for the phylogenetic constraints is computationally challenging. In this thesis, I report on a new method that I developed, called scVILP, that jointly detects mutations in individual cells and reconstructs a “perfect phylogeny” of the cells (a phylogeny on which every site in the genomes mutates at most once). The method employs a novel Integer Linear Programming (ILP) formulation and utilizes publicly available ILP solvers. Furthermore, to address the scalability issue, I developed a divide-and-conquer technique, where the ILP formulation is applied to and solved on subsets of the data, and the results are combined while resolving conflicts via constraints that are also formulated in terms of ILP. I demonstrate through analysis of simulated data sets that my method has accuracy that is similar to or better than that of existing methods, and has significantly better runtime. My method provides a promising approach for analyzing large single-cell genomic data sets.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationEdrisi, Mohammadamin. "Simultaneous SNV calling and Phylogenetic Inference for Single-cell Sequencing Data." (2020) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/109582">https://hdl.handle.net/1911/109582</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/109582en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectSingle-cell sequencingen_US
dc.subjectPerfect Phylogenyen_US
dc.titleSimultaneous SNV calling and Phylogenetic Inference for Single-cell Sequencing Dataen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
EDRISI-DOCUMENT-2020.pdf
Size:
28.5 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.85 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: