Scott, David WLiang, HanWei, Peng2019-08-012020-08-012019-082019-07-29August 201Chen, Zhongyuan. "Association Studies in Human Cancers: Metabolic Expression Subtypes and Somatic Mutations/Germline Variations." (2019) Diss., Rice University. <a href="https://hdl.handle.net/1911/106172">https://hdl.handle.net/1911/106172</a>.https://hdl.handle.net/1911/106172Cancer is a highly complex genetic disease caused by certain gene mutations. This thesis focuses on two critical categories of association studies for human cancers: associations between tumor metabolic subtypes and various other cancer aspects, and associations between somatic mutations and germline variations. In the first category, we classify metabolic expression subtypes in multiple TCGA (the Cancer Genome Atlas) cancer types, identify consistent prognostic patterns, and analyze master regulators of metabolic subtypes. We apply various statistical methods to study the associations between the metabolic expression subtypes and patients' survival, somatic mutations, copy number variations, and hallmark pathways. The results show that the metabolic expression subtypes are extensively correlated with patients' survival. The work gives a systematic view of metabolic heterogeneity and indicates the values of metabolic expression subtypes as predictive, prognostic, and therapeutic markers. In the second category, we design data-adaptive and pathway-based large-sample score test methods for association studies between somatic mutations and germline variations. A combination of multiple statistical techniques is used. Extensive information aggregation at both SNP and gene levels is involved. p-values from different parameters are combined to yield data-adaptive tests for somatic mutations and germline variations. To avoid using too many parameters so as to reduce costs, a randomized low-rank parameter preselection strategy is proposed to predict parameters that are likely more effective. In comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations test methods are much more flexible, can apply to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers. The test models are applied to both simulations and real-world ICGC (International Cancer Genome Consortium) datasets. For the ICGC data, a sequence of filtering, screening, and processing techniques is applied, followed by extensive association studies with our models. Our studies systematically identify the associations between various germline variations and somatic mutations across different cancer types. Our research provides valuable statistical tools for cancer risk prediction. The work leads to deeper understanding of molecular mechanisms of specific cancer genes and brings new insights into the development of novel cancer therapy.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.Association studiesstatistical testshuman cancerssomatic mutationsgermline variationsmetabolic expression subtypesAssociation Studies in Human Cancers: Metabolic Expression Subtypes and Somatic Mutations/Germline VariationsThesis2019-08-01