Browsing by Author "Zafar, Hamim"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item A SNP Calling And Genotyping Method For Single-cell Sequencing Data(2015-04-23) Zafar, Hamim; Nakhleh, Luay K.; Kavraki, Lydia E; Jermaine, Chris M; Chen, KenIn this thesis, we propose a single nucleotide polymorphism (SNP) calling and genotyping algorithm for single-cell sequencing data generated by the recently developed single-cell sequencing (SCS) technologies. SCS methods promise to address several key issues in cancer research which previously could not be resolved with data obtained from second generation or next-generation sequencing (NGS) technologies. SCS has the power to resolve cancer genome at a single-cell level and can characterize the genomic alterations that might differ from one cell to another. SNPs are the most commonly occurring genomic variations that alter the gene functions in cancer. Several methods exist for calling SNPs from NGS data. However, these methods are not suitable in the SCS scenario because they do not account for the various amplification errors associated with the SCS data. As a result, the existing SNP calling methods perform poorly, producing a large number of false positives when applied on SCS data. To the best of our knowledge, no SNP calling method exists that is specifically designed for SCS data. Our SNP calling algorithm is specifically designed for SCS data and the underlying statistical model deals with the inherent errors of SCS like allelic dropout, high bias for C : G > T : A and other amplification errors. This results in ~50% reduction in the number of false positives and ~30% increase in precision in calling SNPs as compared to GATK, a state-of-the-art SNP calling method for NGS data. Our algorithm also employs an improved genotyping method to properly genotype the individual cells by avoiding the sequencing errors (e.g., base calling error). Our method is the first SCS-specific SNP calling method and it can be used to characterize the SNPs present in individual cancer cells. Potentially, it can be applied as a first step in the genealogical analysis of tumor cells for tracing the evolutionary history of a tumor.Item Matrilysin/MMP-7 Cleavage of Perlecan/HSPG2 Complexed with Semaphorin 3A Supports FAK-Mediated Stromal Invasion by Prostate Cancer Cells(Springer Nature, 2018) Grindel, Brian J.; Martinez, Jerahme R.; Tellman, Tristen V.; Harrington, Daniel Anton; Zafar, Hamim; Nakhleh, Luay K.; Chung, Leland W.K.; Farach-Carson, Mary C.Interrupting the interplay between cancer cells and extracellular matrix (ECM) is a strategy to halt tumor progression and stromal invasion. Perlecan/heparan sulfate proteoglycan 2 (HSPG2) is an extracellular proteoglycan that orchestrates tumor angiogenesis, proliferation, differentiation and invasion. Metastatic prostate cancer (PCa) cells degrade perlecan-rich tissue borders to reach bone, including the basement membrane, vasculature, reactive stromal matrix and bone marrow. Domain IV-3, perlecan's last 7 immunoglobulin repeats, mimics native proteoglycan by promoting tumoroid formation. This is reversed by matrilysin/matrix metalloproteinase-7 (MMP-7) cleavage to favor cell dispersion and tumoroid dyscohesion. Both perlecan and Domain IV-3 induced a strong focal adhesion kinase (FAK) dephosphorylation/deactivation. MMP-7 cleavage of perlecan reversed this, with FAK in dispersed tumoroids becoming phosphorylated/activated with metastatic phenotype. We demonstrated Domain IV-3 interacts with the axon guidance protein semaphorin 3A (Sema3A) on PCa cells to deactivate pro-metastatic FAK. Sema3A antibody mimicked the Domain IV-3 clustering activity. Direct binding experiments showed Domain IV-3 binds Sema3A. Knockdown of Sema3A prevented Domain IV-3-induced tumoroid formation and Sema3A was sensitive to MMP-7 proteolysis. The perlecan-Sema3A complex abrogates FAK activity and stabilizes PCa cell interactions. MMP-7 expressing cells destroy the complex to initiate metastasis, destroy perlecan-rich borders, and favor invasion and progression to lethal bone disease.Item Monovar: single-nucleotide variant detection in single cells(Springer Nature, 2016) Zafar, Hamim; Wang, Yong; Nakhleh, Luay; Navin, Nicholas; Chen, KenCurrent variant callers are not suitable for single-cell DNA sequencing, as they do not account for allelic dropout, false-positive errors and coverage nonuniformity. We developed Monovar (https://bitbucket.org/hamimzafar/monovar), a statistical method for detecting and genotyping single-nucleotide variants in single-cell data. Monovar exhibited superior performance over standard algorithms on benchmarks and in identifying driver mutations and delineating clonal substructure in three different human tumor data sets.Item Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data(Oxford University Press, 2022) Edrisi, Mohammadamin; Valecha, Monica V.; Chowdary, Sunkara B.V.; Robledo, Sergio; Ogilvie, Huw A.; Posada, David; Zafar, Hamim; Nakhleh, LuaySingle-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCIΦ and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data.Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCIΦ in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases.Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar.Item SiCloneFit: Bayesian inference of population structure, genotype, and phylogeny of tumor clones from single-cell genome sequencing data(Cold Spring Harbor Laboratory Press, 2019) Zafar, Hamim; Navin, Nicholas; Chen, Ken; Nakhleh, LuayAccumulation and selection of somatic mutations in a Darwinian framework result in intra-tumor heterogeneity (ITH) that poses significant challenges to the diagnosis and clinical therapy of cancer. Identification of the tumor cell populations (clones) and reconstruction of their evolutionary relationship can elucidate this heterogeneity. Recently developed single-cell DNA sequencing (SCS) technologies promise to resolve ITH to a single-cell level. However, technical errors in SCS data sets, including false-positives (FP) and false-negatives (FN) due to allelic dropout, and cell doublets, significantly complicate these tasks. Here, we propose a nonparametric Bayesian method that reconstructs the clonal populations as clusters of single cells, genotypes of each clone, and the evolutionary relationship between the clones. It employs a tree-structured Chinese restaurant process as the prior on the number and composition of clonal populations. The evolution of the clonal populations is modeled by a clonal phylogeny and a finite-site model of evolution to account for potential mutation recurrence and losses. We probabilistically account for FP and FN errors, and cell doublets are modeled by employing a Beta-binomial distribution. We develop a Gibbs sampling algorithm comprising partial reversible-jump and partial Metropolis-Hastings updates to explore the joint posterior space of all parameters. The performance of our method on synthetic and experimental data sets suggests that joint reconstruction of tumor clones and clonal phylogeny under a finite-site model of evolution leads to more accurate inferences. Our method is the first to enable this joint reconstruction in a fully Bayesian framework, thus providing measures of support of the inferences it makes.Item SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models(BioMed Central, 9/19/2017) Zafar, Hamim; Tzen, Anthony; Navin, Nicholas; Chen, Ken; Nakhleh, LuayAbstract Single-cell sequencing enables the inference of tumor phylogenies that provide insights on intra-tumor heterogeneity and evolutionary trajectories. Recently introduced methods perform this task under the infinite-sites assumption, violations of which, due to chromosomal deletions and loss of heterozygosity, necessitate the development of inference methods that utilize finite-sites models. We propose a statistical inference method for tumor phylogenies from noisy single-cell sequencing data under a finite-sites model. The performance of our method on synthetic and experimental data sets from two colorectal cancer patients to trace evolutionary lineages in primary and metastatic tumors suggests that employing a finite-sites model leads to improved inference of tumor phylogenies.Item Simultaneous SNV calling and Phylogenetic Inference for Single-cell Sequencing Data(2020-11-06) Edrisi, Mohammadamin; Nakhleh , Luay; Shrivastava, Anshumali; Treangen, Todd; Zafar, HamimSingle-cell sequencing provides a powerful approach for elucidating intratumor heterogeneity by resolving cell-to-cell variability. However, it also poses additional challenges including elevated error rates, allelic dropout, and non-uniform coverage. Variant calling in this context is the task of identifying mutations in the genomes of individual cells while accounting for the multiple types of errors. One powerful approach for solving this task computationally is to rely on a phylogenetic context, since the genomes under analysis evolved from a common ancestor along the branches of a tree. The phylogenetic tree captures the temporal dependencies across the genomes and provides an important constraint that allows to distinguish true mutations from error that masquerades as mutation. However, this approach of simultaneously identifying mutations while accounting for the phylogenetic constraints is computationally challenging. In this thesis, I report on a new method that I developed, called scVILP, that jointly detects mutations in individual cells and reconstructs a “perfect phylogeny” of the cells (a phylogeny on which every site in the genomes mutates at most once). The method employs a novel Integer Linear Programming (ILP) formulation and utilizes publicly available ILP solvers. Furthermore, to address the scalability issue, I developed a divide-and-conquer technique, where the ILP formulation is applied to and solved on subsets of the data, and the results are combined while resolving conflicts via constraints that are also formulated in terms of ILP. I demonstrate through analysis of simulated data sets that my method has accuracy that is similar to or better than that of existing methods, and has significantly better runtime. My method provides a promising approach for analyzing large single-cell genomic data sets.Item Statistical Methods for Elucidating Tumor Heterogeneity and Evolution from Single-cell DNA Sequencing Data(2018-08-08) Zafar, Hamim; Nakhleh, Luay; Chen, KenIntra-tumor heterogeneity, as caused by a combination of mutation and selection, poses significant challenges to the diagnosis and clinical therapy of cancer. Resolving this heterogeneity to identify the tumor cell populations (clones) and delineate their evolutionary history is of critical importance in improving cancer diagnosis and therapy. This heterogeneity can be readily elucidated and understood through the reconstruction of the clonal genotypes and evolutionary history of the tumor cells. These tasks are challenging since genomic data is most often collected from one snapshot during the evolution of the tumor's constituent cells. Consequently, using computational methods that infer the tumor phylogeny and tumor subpopulations from sequence data is the approach of choice. Recently emerged single-cell DNA sequencing (SCS) technologies promise to resolve intra-tumor heterogeneity to a single-cell level. However, inherent technical errors in SCS datasets, including false-positive (FP) errors, false-negatives (FN) due to allelic dropout, cell doublets and coverage non-uniformity significantly complicate these tasks. In this thesis, we first develop a likelihood-based approach for inferring tumor trees from imperfect SCS genotype data with potentially missing entries, under a finite-sites model of evolution. Our model of evolution introduces a continuous time Markov chain that accounts for the effects of different events in tumor evolution including point mutations, loss of heterozygosity, deletion and recurrent mutations on genomic sites. Our method probabilistically accounts for false positive and false negative errors and missing entries in SCS datasets. With the help of a heuristic search algorithm, our method finds a maximum-likelihood solution for the phylogenetic tree that best describes the evolutionary history of the tumor cells in the SCS dataset. In doing so, our method also estimates the error rates associated with the datasets. Another contribution of this method is to infer the order of the mutations on the branches of the inferred tumor phylogeny. This is done using a maximum-likelihood-based dynamic programming algorithm. The performance of our method on synthetic and experimental datasets from two colorectal cancer patients to trace evolutionary lineages in primary and metastatic tumors suggests that employing a finite-sites model leads to an improved inference of tumor phylogenies. Secondly, we develop a non-parametric Bayesian method that simultaneously reconstructs the clonal populations as clusters of single cells, mutations associated with each clone, and the genealogical relationships between the clonal populations. It employs a tree-structured Chinese restaurant process as a prior on the number and composition of clonal populations. The evolution of the clonal populations is modeled by a clonal phylogeny and a finite-sites model of evolution to account for potential mutation recurrence and losses. We probabilistically account for FP and FN errors, and cell doublets are modeled by employing a Beta-binomial distribution. We develop a Gibbs sampling algorithm comprising of partial reversible-jump and partial Metropolis-Hastings updates to explore the joint posterior space of all parameters. The performance of our method on synthetic and experimental datasets suggests that joint reconstruction of tumor clones and clonal phylogeny under a finite-sites model of evolution leads to more accurate inferences. Our method is the first to enable this joint reconstruction in a fully Bayesian framework, thus providing measures of support of the inferences it makes.