Simultaneous SNV calling and Phylogenetic Inference for Single-cell Sequencing Data

dc.contributor.advisorNakhleh , Luay
dc.contributor.committeeMemberShrivastava, Anshumali
dc.contributor.committeeMemberTreangen, Todd
dc.contributor.committeeMemberZafar, Hamim
dc.creatorEdrisi, Mohammadamin
dc.date.accessioned2020-11-24T14:28:59Z
dc.date.available2020-11-24T14:28:59Z
dc.date.created2020-12
dc.date.issued2020-11-06
dc.date.submittedDecember 2020
dc.date.updated2020-11-24T14:28:59Z
dc.description.abstractSingle-cell sequencing provides a powerful approach for elucidating intratumor heterogeneity by resolving cell-to-cell variability. However, it also poses additional challenges including elevated error rates, allelic dropout, and non-uniform coverage. Variant calling in this context is the task of identifying mutations in the genomes of individual cells while accounting for the multiple types of errors. One powerful approach for solving this task computationally is to rely on a phylogenetic context, since the genomes under analysis evolved from a common ancestor along the branches of a tree. The phylogenetic tree captures the temporal dependencies across the genomes and provides an important constraint that allows to distinguish true mutations from error that masquerades as mutation. However, this approach of simultaneously identifying mutations while accounting for the phylogenetic constraints is computationally challenging. In this thesis, I report on a new method that I developed, called scVILP, that jointly detects mutations in individual cells and reconstructs a “perfect phylogeny” of the cells (a phylogeny on which every site in the genomes mutates at most once). The method employs a novel Integer Linear Programming (ILP) formulation and utilizes publicly available ILP solvers. Furthermore, to address the scalability issue, I developed a divide-and-conquer technique, where the ILP formulation is applied to and solved on subsets of the data, and the results are combined while resolving conflicts via constraints that are also formulated in terms of ILP. I demonstrate through analysis of simulated data sets that my method has accuracy that is similar to or better than that of existing methods, and has significantly better runtime. My method provides a promising approach for analyzing large single-cell genomic data sets.
dc.format.mimetypeapplication/pdf
dc.identifier.citationEdrisi, Mohammadamin. "Simultaneous SNV calling and Phylogenetic Inference for Single-cell Sequencing Data." (2020) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/109582">https://hdl.handle.net/1911/109582</a>.
dc.identifier.urihttps://hdl.handle.net/1911/109582
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectSingle-cell sequencing
dc.subjectPerfect Phylogeny
dc.titleSimultaneous SNV calling and Phylogenetic Inference for Single-cell Sequencing Data
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
EDRISI-DOCUMENT-2020.pdf
Size:
28.5 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.85 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: