Browsing by Author "Wang, Wenyi"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Embargo Characterization of cancer development and recurrence through mathematical and statistical modeling(2024-04-10) Nguyen, Hoai Nam; Wang, Wenyi; Kimmel, MarekLi-Fraumeni syndrome (LFS) is a genetic disorder characterized by deleterious germline mutations in the TP53 tumor suppressor gene. Due to the compromised DNA repair mechanisms, patients with LFS are significantly more likely to develop a spectrum of cancer types. Furthermore, it is not uncommon for LFS patients to develop multiple primary cancers. Two risk prediction models were developed for LFS: (i) a cancer-specific model that predicts cancer-specific risks for the first primary and (ii) a multiple primary cancer model that predicts the risk of a second primary without distinguishing between cancer types. Although they have been validated on research cohorts, it is essential to show that they perform well on a clinical cohort, which more closely resembles the patient data that are observed in real counseling sessions. In the first project, we validate the models in both discrimination and calibration via the Area Under the Curve (AUC) and Observed/Expected (O/E) ratio, respectively, on a dataset collected from the Clinical Cancer Genetics program at MD Anderson Cancer Center (MDACC). To expedite the dissemination of these models, we further refine the associated software tools, LFSPRO and LFSPROShiny. A major limitation of the previous models is that they do not predict cancer-specific risks beyond the first primary. In statistical survival analysis, multiple primary cancers can be regarded as recurrent events, and different cancer types can be regarded as non-terminal competing risks. Although many models have been proposed to address these two phenomenons separately, a unified statistical framework remains a gap in knowledge. In the second project, we develop a generalized and interpretable Bayesian model that fully accounts for the complex relationships between the recurrent events. We use a non-homogeneous Poisson process to model the occurrence processes of the competing risks, each of which is characterized by a time-dependent intensity function that follows a Cox regression model. For family datasets, we further introduce fraity terms to capture within-family correlations that are induced by the unobserved covariates, and recursively compute the family-wise likelihood via the Elston-Stewart peeling algorithm to account for the dependence of family members through missing genotypes. The model parameters are estimated via a Metropolis-Hastings-within-Gibbs sampling scheme. We train and cross-validate our model on a LFS patient cohort that is prospectively collected at MDACC. In the third project, we perform a much more extensive validation of the model on independent patient cohorts from major cancer institutes across the United States. Stem cells are closely related to cancer. Given their ability to develop into many different cell types, stem cell transplants can be used to replace cells that are damaged by high doses of radiotherapy and chemotherapy, thus accelerating the process of cancer treatment. On the other hand, stem cells survive much longer than ordinary cells, and are thus more likely to accumulate harmful genetic mutations, which have the potential to trigger carcinogenesis. During cell division, a stem cell forms a progenitor cell, which continues to differentiate into the target cell type, and renews itself. In the last project, we mathematically describe this process using a two-type age-dependent branching process. By deriving closed-form expressions of the probability generating functions, we study the behavior of such process in both finite time and large time under different dynamics of the two cell types, which correspond to various biological scenarios.Item Statistical Modeling for Cellular Heterogeneity Problems in Cancer Research: Deconvolution, Gaussian Graphical Models and Logistic Regression(2017-04-17) Wang, Zeya; Wang, Wenyi; Morris, Jeffrey S.; Scott, David W.Tumor tissue samples comprise a mixture of cancerous and surrounding normal cells. Investigating cellular heterogeneity in tumors is crucial to genomic analyses associated with cancer prognosis and treatment decisions, where the contamination of non-cancerous cells may substantially affect gene expression profiling in clinically derived malignant tumor samples. For this purpose, we first computationally purify tumor profiles, and then develop new statistical modeling techniques to incorporate tumor purity estimates for genetic correlation and prediction of clinical outcome in cancer research. In this thesis, we propose novel approaches to analyzing and modeling cellular heterogeneity problems using genomic data from three perspectives. First, we develop a computation tool, DeMixT, which applies a deconvolution algorithm to explicitly account for at most three cellular components associated with cancer. Compared with the experimental approach to isolate single cells, in silico dissection of tumor samples is faster and cheaper, but computational tools previously developed have limited ability to estimate cellular proportions and tumor-specific expression profiles, when neither is given with prior information. Our model al- lows inclusion of the infiltrating immune cells as a component as well as the tumor cells and stromal cells. We assume a linear mixture of gene expression profiles for each component satisfying a log2-normal distribution and propose an iterated conditional modes algorithm to estimate parameters. We also involve a novel two-stage estimation procedure for the three-component deconvolution. Our method is computationally feasible and yields accurate estimates through simulations and real data analyses. The estimated cellular proportions and purified expression profiles can pro- vide deeper insight for cancer biomarker studies. Second, we propose a novel edge regression model for undirected graphs, which incorporates subject-level covariates to estimate the conditional dependencies. Current work for constructing graphical models for multivariate data does not take into account the subject specific information, which can bias the conditional independence structure in heterogeneous data. Especially for tumor samples with inherent contamination from normal cells, ignoring the cellular heterogeneity and modeling the population-level genomic graphs may inhibit the discovery of the true tumor graph, which would be attenuated towards the normal graph. Our model allows undirected networks to vary with the exogenous covariates and is able to borrow strength from different related graphs for estimating more robust covariate-specific graphs. Bayesian shrinkage algorithms are presented to efficiently estimate and induce sparsity for generating subject-level graphs. We demonstrate the good performance of our method through simulation studies and apply our method to cytokine measurements from blood plasma samples from hepatocellular carcinoma (HCC) patients and normal controls. Third, we build a model with respect to logistic regression that includes tumor purity as a scaling factor to improve model robustness for the purpose of both estimation and prediction. Penalized logistic regression is used to identify variables (genes) and predict clinical status with binary outcomes that are associated with cancers in high-dimensional genomic data. We aim to reduce the uncertainty introduced by cellular heterogeneity through incorporating the measure of tumor purity to quantify the power of data for each sample. We provide strategies of choosing scaling parameters. Our model is finally shown to work well through a set of simulation studies. We believe that the statistical modeling, technical pipelines and computational results included in our work will serve as a first guide for the development of statistical methods accounting for cellular heterogeneity in cancer research.Item Embargo Statistical Modeling of Intratumor Heterogeneity for Cancer Evolution Insights(2024-04-17) Jiang, Yujie; Wang, Wenyi; Kimmel, MarekTumors accumulate many somatic mutations in their lifetime, leading to intratumor heterogeneity characterized by subpopulations of tumor cells with distinct mutation profiles. This heterogeneity is a key driver of tumor evolution and therapy response, influencing the prognosis of cancers. Understanding the subclonal architecture offers vital insights into tumor evolution and the advancement of precision cancer treatment. During my Ph.D., I focused on tracking intratumor heterogeneity from bulk DNA sequencing data. I developed a new statistical model, Clonal structure identification through Pairwise Penalization (CliPP), to overcome the computational challenges in subclonal reconstruction. CliPP is the first method that clusters subclonal mutations using a regularized likelihood model, enabling rapid and accurate analysis of whole-genome and whole-exome sequencing data from over 12,000 tumor samples. I will first introduce the CliPP model, detailing its mathematical basis and the rigorous validation process on both simulated and real datasets. This model represents a significant advancement in the field of genetic analysis of cancer. CliPP was then applied to a pan-cancer dataset comprising 7,827 tumors from 32 cancer types. It enabled the examination of our newly introduced term subclonal mutational load (sML) across various cancers, revealing its association with patient survival outcome. Our findings underscore sML as a critical feature of cancer, particularly in cancers presenting low and moderate TMB. The comprehensive study with CliPP suggests that sML is crucial for understanding the evolutionary timeline of tumors and their response to treatments. All these applications involved detailed technical aspects of tumor mutation calling and copy number profiling. Recognizing the limitations of reconstructing subclones from single-region samples, I also expanded the CliPP model by integrating Consensus clustering for Subclonal Reconstruction (CSR) to enable simultaneous analysis of multiple tumor regions from one sample. As an open-ended exploration, I aim to provide a more comprehensive view of a tumor’s genetic diversity and evolutionary history, allowing for the identification of subclones that may exist exclusively in certain tumor regions. All together, these projects bridge significant gaps in tumor evolution research, from enhancing the computational efficiency of tumor subclonal reconstruction to extending our understanding of the mechanism of tumor heterogeneity as well as their clinical impact.Item The origin of bladder cancer from mucosal field effects(Cell Press, 2022) Bondaruk, Jolanta; Jaksik, Roman; Wang, Ziqiao; Cogdell, David; Lee, Sangkyou; Chen, Yujie; Dinh, Khanh Ngoc; Majewski, Tadeusz; Zhang, Li; Cao, Shaolong; Tian, Feng; Yao, Hui; Kuś, Paweł; Chen, Huiqin; Weinstein, John N.; Navai, Neema; Dinney, Colin; Gao, Jianjun; Theodorescu, Dan; Logothetis, Christopher; Guo, Charles C.; Wang, Wenyi; McConkey, David; Wei, Peng; Kimmel, Marek; Czerniak, BogdanWhole-organ mapping was used to study molecular changes in the evolution of bladder cancer from field effects. We identified more than 100 dysregulated pathways, involving immunity, differentiation, and transformation, as initiators of carcinogenesis. Dysregulation of interleukins signified the involvement of inflammation in the incipient phases of the process. An aberrant methylation/expression of multiple HOX genes signified dysregulation of the differentiation program. We identified three types of mutations based on their geographic distribution. The most common were mutations restricted to individual mucosal samples that targeted uroprogenitor cells. Two types of mutations were associated with clonal expansion and involved large areas of mucosa. The α mutations occurred at low frequencies while the β mutations increased in frequency with disease progression. Modeling revealed that bladder carcinogenesis spans 10–15 years and can be divided into dormant and progressive phases. The progressive phase lasted 1-2 years and was driven by β mutations.