Browsing by Author "Edrisi, Mohammadamin"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Assessing the performance of methods for copy number aberration detection from single-cell DNA sequencing data(Public Library of Science, 2020) Mallory, Xian F.; Edrisi, Mohammadamin; Navin, Nicholas; Nakhleh, LuaySingle-cell DNA sequencing technologies are enabling the study of mutations and their evolutionary trajectories in cancer. Somatic copy number aberrations (CNAs) have been implicated in the development and progression of various types of cancer. A wide array of methods for CNA detection has been either developed specifically for or adapted to single-cell DNA sequencing data. Understanding the strengths and limitations that are unique to each of these methods is very important for obtaining accurate copy number profiles from single-cell DNA sequencing data. We benchmarked three widely used methods–Ginkgo, HMMcopy, and CopyNumber–on simulated as well as real datasets. To facilitate this, we developed a novel simulator of single-cell genome evolution in the presence of CNAs. Furthermore, to assess performance on empirical data where the ground truth is unknown, we introduce a phylogeny-based measure for identifying potentially erroneous inferences. While single-cell DNA sequencing is very promising for elucidating and understanding CNAs, our findings show that even the best existing method does not exceed 80% accuracy. New methods that significantly improve upon the accuracy of these three methods are needed. Furthermore, with the large datasets being generated, the methods must be computationally efficient.Item Computational Methods for Analyses of Single-cell DNA Sequencing Data in Cancer(2024-04-16) Edrisi, Mohammadamin; Nakhleh, LuayThe study of cancer using single-cell sequencing technology has opened up exciting new avenues for understanding the genomic complexity and heterogeneity of this disease. However, the analysis of such data presents computational challenges both in terms of designing novel mathematical models for biological discovery as well as devising new methods that are scalable to the newly emerged large-scale single-cell sequencing data. Throughout my Ph.D. studies, I focused on multiple research projects, each of which aimed to address such computational challenges in analyzing single-cell sequencing data in the context of cancer. In this thesis, I present my contributions to three studies and their corresponding methods, including Phylovar for phylogeny-aware detection of single-nucleotide variations (SNVs), MoTERNN for classifying the mode of cancer evolution, and MaCroDNA for integrating high-throughput single-cell DNA and RNA sequencing data. In Phylovar, I improved the joint inference of cancer cells' SNVs (a common type of mutation in cancer) and their phylogeny, an approach known as phylogeny-aware SNV detection. Although this approach is highly accurate, its scalability to large-scale single-cell sequencing datasets was limited. To address this, I introduced a novel vectorized formulation for computing the likelihood function of this model, achieving very good improvement in calculation speed, enabling us to scale up accurate SNV detection from hundreds to millions of genomic loci suitable for the fast-expanding datasets from single-cell whole-genome and whole-exome sequencing technologies. MoTERNN is aimed at determining modes of cancer evolution—linear, branching, neutral, or punctuated—each indicative of specific evolution patterns critical for diagnosis, prognosis, and treatment strategies. I treated this as a graph classification problem, using phylogenetic trees as graphs and evolution modes as classes, and employed Recursive Neural Networks (RvNNs) for classification. As the first application of RvNNs to phylogenetics, MoTERNN demonstrated very high accuracy in both the training and testing phases, showcasing the potential of RvNNs for learning on phylogenetic trees. In the MaCroDNA project, I aimed to link DNA mutations to their impacts on RNA changes by pairing the cells that have been sequenced for either DNA or RNA data alone. In this work, I employed a maximum weighted bipartite matching algorithm for assigning the cells from the two data domains so that the sum of the Pearson correlation between all pairs is maximized. MaCroDNA achieved very good accuracy and outperformed the state-of-the-art method by a large margin.Item Current progress and open challenges for applying deep learning across the biosciences(Springer Nature, 2022) Sapoval, Nicolae; Aghazadeh, Amirali; Nute, Michael G.; Antunes, Dinler A.; Balaji, Advait; Baraniuk, Richard; Barberan, C.J.; Dannenfelser, Ruth; Dun, Chen; Edrisi, Mohammadamin; Elworth, R.A. Leo; Kille, Bryce; Kyrillidis, Anastasios; Nakhleh, Luay; Wolfe, Cameron R.; Yan, Zhi; Yao, Vicky; Treangen, Todd J.; Bioengineering; Computer ScienceDeep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.Item Methods for copy number aberration detection from single-cell DNA-sequencing data(Springer Nature, 2020) Mallory, Xian F.; Edrisi, Mohammadamin; Navin, Nicholas; Nakhleh, LuayCopy number aberrations (CNAs), which are pathogenic copy number variations (CNVs), play an important role in the initiation and progression of cancer. Single-cell DNA-sequencing (scDNAseq) technologies produce data that is ideal for inferring CNAs. In this review, we review eight methods that have been developed for detecting CNAs in scDNAseq data, and categorize them according to the steps of a seven-step pipeline that they employ. Furthermore, we review models and methods for evolutionary analyses of CNAs from scDNAseq data and highlight advances and future research directions for computational methods for CNA detection from scDNAseq data.Item Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data(Oxford University Press, 2022) Edrisi, Mohammadamin; Valecha, Monica V.; Chowdary, Sunkara B.V.; Robledo, Sergio; Ogilvie, Huw A.; Posada, David; Zafar, Hamim; Nakhleh, LuaySingle-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCIΦ and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data.Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCIΦ in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases.Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar.Item Simultaneous SNV calling and Phylogenetic Inference for Single-cell Sequencing Data(2020-11-06) Edrisi, Mohammadamin; Nakhleh , Luay; Shrivastava, Anshumali; Treangen, Todd; Zafar, HamimSingle-cell sequencing provides a powerful approach for elucidating intratumor heterogeneity by resolving cell-to-cell variability. However, it also poses additional challenges including elevated error rates, allelic dropout, and non-uniform coverage. Variant calling in this context is the task of identifying mutations in the genomes of individual cells while accounting for the multiple types of errors. One powerful approach for solving this task computationally is to rely on a phylogenetic context, since the genomes under analysis evolved from a common ancestor along the branches of a tree. The phylogenetic tree captures the temporal dependencies across the genomes and provides an important constraint that allows to distinguish true mutations from error that masquerades as mutation. However, this approach of simultaneously identifying mutations while accounting for the phylogenetic constraints is computationally challenging. In this thesis, I report on a new method that I developed, called scVILP, that jointly detects mutations in individual cells and reconstructs a “perfect phylogeny” of the cells (a phylogeny on which every site in the genomes mutates at most once). The method employs a novel Integer Linear Programming (ILP) formulation and utilizes publicly available ILP solvers. Furthermore, to address the scalability issue, I developed a divide-and-conquer technique, where the ILP formulation is applied to and solved on subsets of the data, and the results are combined while resolving conflicts via constraints that are also formulated in terms of ILP. I demonstrate through analysis of simulated data sets that my method has accuracy that is similar to or better than that of existing methods, and has significantly better runtime. My method provides a promising approach for analyzing large single-cell genomic data sets.