Computational Methods for Analyses of Single-cell DNA Sequencing Data in Cancer

Date
2024-04-16
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

The study of cancer using single-cell sequencing technology has opened up exciting new avenues for understanding the genomic complexity and heterogeneity of this disease. However, the analysis of such data presents computational challenges both in terms of designing novel mathematical models for biological discovery as well as devising new methods that are scalable to the newly emerged large-scale single-cell sequencing data. Throughout my Ph.D. studies, I focused on multiple research projects, each of which aimed to address such computational challenges in analyzing single-cell sequencing data in the context of cancer. In this thesis, I present my contributions to three studies and their corresponding methods, including Phylovar for phylogeny-aware detection of single-nucleotide variations (SNVs), MoTERNN for classifying the mode of cancer evolution, and MaCroDNA for integrating high-throughput single-cell DNA and RNA sequencing data.

In Phylovar, I improved the joint inference of cancer cells' SNVs (a common type of mutation in cancer) and their phylogeny, an approach known as phylogeny-aware SNV detection. Although this approach is highly accurate, its scalability to large-scale single-cell sequencing datasets was limited. To address this, I introduced a novel vectorized formulation for computing the likelihood function of this model, achieving very good improvement in calculation speed, enabling us to scale up accurate SNV detection from hundreds to millions of genomic loci suitable for the fast-expanding datasets from single-cell whole-genome and whole-exome sequencing technologies.

MoTERNN is aimed at determining modes of cancer evolution—linear, branching, neutral, or punctuated—each indicative of specific evolution patterns critical for diagnosis, prognosis, and treatment strategies. I treated this as a graph classification problem, using phylogenetic trees as graphs and evolution modes as classes, and employed Recursive Neural Networks (RvNNs) for classification. As the first application of RvNNs to phylogenetics, MoTERNN demonstrated very high accuracy in both the training and testing phases, showcasing the potential of RvNNs for learning on phylogenetic trees.

In the MaCroDNA project, I aimed to link DNA mutations to their impacts on RNA changes by pairing the cells that have been sequenced for either DNA or RNA data alone. In this work, I employed a maximum weighted bipartite matching algorithm for assigning the cells from the two data domains so that the sum of the Pearson correlation between all pairs is maximized. MaCroDNA achieved very good accuracy and outperformed the state-of-the-art method by a large margin.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Single-cell DNA sequencing analysis, cancer evolutionary biology, Single-nucleotide variation detection, Single-cell RNA sequencing analysis, Single-cell multi-omics integration, Maximum-likelihood estimation, Recursive Neural Networks, Maximum weighted bipartite matching
Citation

Edrisi, Mohammadamin. Computational Methods for Analyses of Single-cell DNA Sequencing Data in Cancer. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/116189

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page