Browsing by Author "Morris, Jeffrey S."
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item COVID-TRACK: world and USA SARS-COV-2 testing and COVID-19 tracking(BioMed Central, 2021) Zohner, Ye Emma; Morris, Jeffrey S.Background: The COVID-19 pandemic has caused major health and socio-economic disruptions worldwide. Accurate investigation of emerging data is crucial to inform policy makers as they construct viral mitigation strategies. Complications such as variable testing rates and time lags in counting cases, hospitalizations and deaths make it challenging to accurately track and identify true infectious surges from available data, and requires a multi-modal approach that simultaneously considers testing, incidence, hospitalizations, and deaths. Although many websites and applications report a subset of these data, none of them provide graphical displays capable of comparing different states or countries on all these measures as well as various useful quantities derived from them. Here we introduce a freely available dynamic representation tool, COVID-TRACK, that allows the user to simultaneously assess time trends in these measures and compare various states or countries, equipping them with a tool to investigate the potential effects of the different mitigation strategies and timelines used by various jurisdictions. Findings: COVID-TRACK is a Python based web-application that provides a platform for tracking testing, incidence, hospitalizations, and deaths related to COVID-19 along with various derived quantities. Our application makes the comparison across states in the USA and countries in the world easy to explore, with useful transformation options including per capita, log scale, and/or moving averages. We illustrate its use by assessing various viral trends in the USA and Europe. Conclusion: The COVID-TRACK web-application is a user-friendly analytical tool to compare data and trends related to the COVID-19 pandemic across areas in the United States and worldwide. Our tracking tool provides a unique platform where trends can be monitored across geographical areas in the coming months to watch how the pandemic waxes and wanes over time at different locations around the USA and the globe.Item Feature Learning and Bayesian Functional Regression for High-Dimensional Complex Data(2021-12-02) Zohner, Ye Emma M; Li, Meng; Morris, Jeffrey S.In recent years, technological innovations have facilitated the collection of complex, high-dimensional data that pose substantial modeling challenges. Most of the time, these complex objects are strongly characterized by internal structure that makes sparse representations possible. If we can learn a sparse set of features that accurately captures the salient features of a given object, then we can model these features using standard statistical tools including clustering, regression and classification. The key question is how well this sparse set of features captures the salient information in the objects. In this thesis, we develop methodology for evaluating latent feature representations for functional data and for using these latent features within functional regression frameworks to build flexible models. In the first project, we introduce a graphical latent feature representation tool (GLaRe) to learn features and assess how well a given feature learning approach captures the salient information in a data object. In the second project, we build on this feature learning methodology to propose a basis strategy for fitting functional regression models when the domain is a closed manifold. This methodology is applied to MRI data to characterize patterns of infant cortical thickness development in the first two years of life. In the third project, we adapt our feature learning and Bayesian functional regression methodology to high-frequency data streams. We model high-frequency intraocular pressure data streams using custom bases for quantile representations of the underlying distribution, and provide insights into the etiology of glaucoma.Item Statistical Approaches for Large-Scale and Complex Omics Data(2019-12-05) Liu, Yusha; Li, Meng; Morris, Jeffrey S.In this thesis, we propose several novel statistical approaches to analyzing large-scale and complex omics data. This thesis consists of three projects. In the first project, with the goal of characterizing gene-level relationships between DNA methylation and gene expression, we introduce a sequential penalized regression approach to identify methylation-expression quantitative trait loci (methyl-eQTLs), a term that we have coined to represent, for each gene and tissue type, a sparse set of CpG loci best explaining gene expression and accompanying weights indicating direction and strength of association, which can be used to construct gene-level methylation summaries that are maximally correlated with gene expression for use in integrative models. Using TCGA and MD Anderson colorectal cohorts to build and validate our models, we demonstrate our strategy explains expression variability much better than commonly used integrative methods. In the second project, we propose a unified Bayesian framework to perform quantile regression on functional responses (FQR). Our approach represents functional coefficients with basis functions to borrow strength from nearby locations, and places a global-local shrinkage prior on the basis coefficients to achieve adaptive regularization. We develop a scalable Gibbs sampler to implement the approach. Simulation studies show that our method has superior performance against competing methods. We apply our method to a mass spectrometry dataset and identify proteomic biomarkers of pancreatic cancer that were entirely missed by mean-regression based approaches. The third project is a theoretical investigation of the FQR problem, extending the previous project. We propose an interpolation-based estimator that can be strongly approximated by a sequence of Gaussian processes, based upon which we can derive the convergence rate of the estimator and construct simultaneous confidence bands for the functional coefficient. The strong approximation results also build a theoretical foundation for the development of alternative approaches that are shown to have better finite-sample performance in simulation studies.Item Statistical Modeling for Cellular Heterogeneity Problems in Cancer Research: Deconvolution, Gaussian Graphical Models and Logistic Regression(2017-04-17) Wang, Zeya; Wang, Wenyi; Morris, Jeffrey S.; Scott, David W.Tumor tissue samples comprise a mixture of cancerous and surrounding normal cells. Investigating cellular heterogeneity in tumors is crucial to genomic analyses associated with cancer prognosis and treatment decisions, where the contamination of non-cancerous cells may substantially affect gene expression profiling in clinically derived malignant tumor samples. For this purpose, we first computationally purify tumor profiles, and then develop new statistical modeling techniques to incorporate tumor purity estimates for genetic correlation and prediction of clinical outcome in cancer research. In this thesis, we propose novel approaches to analyzing and modeling cellular heterogeneity problems using genomic data from three perspectives. First, we develop a computation tool, DeMixT, which applies a deconvolution algorithm to explicitly account for at most three cellular components associated with cancer. Compared with the experimental approach to isolate single cells, in silico dissection of tumor samples is faster and cheaper, but computational tools previously developed have limited ability to estimate cellular proportions and tumor-specific expression profiles, when neither is given with prior information. Our model al- lows inclusion of the infiltrating immune cells as a component as well as the tumor cells and stromal cells. We assume a linear mixture of gene expression profiles for each component satisfying a log2-normal distribution and propose an iterated conditional modes algorithm to estimate parameters. We also involve a novel two-stage estimation procedure for the three-component deconvolution. Our method is computationally feasible and yields accurate estimates through simulations and real data analyses. The estimated cellular proportions and purified expression profiles can pro- vide deeper insight for cancer biomarker studies. Second, we propose a novel edge regression model for undirected graphs, which incorporates subject-level covariates to estimate the conditional dependencies. Current work for constructing graphical models for multivariate data does not take into account the subject specific information, which can bias the conditional independence structure in heterogeneous data. Especially for tumor samples with inherent contamination from normal cells, ignoring the cellular heterogeneity and modeling the population-level genomic graphs may inhibit the discovery of the true tumor graph, which would be attenuated towards the normal graph. Our model allows undirected networks to vary with the exogenous covariates and is able to borrow strength from different related graphs for estimating more robust covariate-specific graphs. Bayesian shrinkage algorithms are presented to efficiently estimate and induce sparsity for generating subject-level graphs. We demonstrate the good performance of our method through simulation studies and apply our method to cytokine measurements from blood plasma samples from hepatocellular carcinoma (HCC) patients and normal controls. Third, we build a model with respect to logistic regression that includes tumor purity as a scaling factor to improve model robustness for the purpose of both estimation and prediction. Penalized logistic regression is used to identify variables (genes) and predict clinical status with binary outcomes that are associated with cancers in high-dimensional genomic data. We aim to reduce the uncertainty introduced by cellular heterogeneity through incorporating the measure of tumor purity to quantify the power of data for each sample. We provide strategies of choosing scaling parameters. Our model is finally shown to work well through a set of simulation studies. We believe that the statistical modeling, technical pipelines and computational results included in our work will serve as a first guide for the development of statistical methods accounting for cellular heterogeneity in cancer research.