Statistical Methods for Multivariate Outcomes with Applications to Biomedical Data

Ding, Maomao

Statistical Methods for Multivariate Outcomes with Applications to Biomedical Data

Files

DING-DOCUMENT-2021.pdf (1.04 MB)

Date

2021-11-29

Authors

Ding, Maomao

Abstract

The analysis of multivariate outcomes is common practice in biomedical studies when independence between outcomes cannot be assumed. Various mechanisms can create dependence in the multivariate outcomes. In this thesis, we consider statistical tools for three settings: competing risks data, multivariate outcomes for comprehensively capturing multidimensional symptoms of a disease, and gene co-expression networks. First, competing risks data arise naturally in biomedical studies where subjects are at risk of more than one failure causes that are mutually exclusive. For example, in a study of monoclonal gammopathy of undetermined significance (MGUS), the competing risks outcomes involved time until progression to a plasma cell malignancy (PCM) and time to death without PCM. Second, when no single outcome is sufficient to quantify the multi-dimensional deterioration of a disease, investigators often rely on multiple outcomes for a comprehensive assessment of the global disease status. This is exemplified by a study on Parkinson's disease, where investigators chose five outcomes to jointly capture the global disease progression. Last, in gene co-expression network analysis, it is of interest to detect pairs of genes that exhibit significant co-expression relationship. Identification of the gene co-expression network will help understand the underlying biological processes in a systematic way.

In the first project, we propose an estimator of the Polytomous Discrimination Index applicable to competing risks data, which can quantify a prognostic models ability to discriminate among subjects from different outcome groups. The proposed estimator allows the prediction model to be subject to model misspecification and enjoys desirable asymptotic properties. We also develop an efficient computation algorithm that features a computational complexity of O(n log n). A perturbation resampling scheme is developed to achieve consistent variance estimation. Numerical results suggest that the estimator performs well under realistic sample sizes. We apply the proposed method to a study of monoclonal gammopathy of undetermined significance and the evaluated the performance Fine-Gray model on this dataset.

In the second project, we develop a sensible semiparametric regression strategy for single/multiple outcomes in longitudinal studies. Our method requires minimal assumptions and can accommodate missing data by the inverse probability weighting technique. We estimate the model parameter by maximizing a rank correlation type objective function. Under mild regularity conditions, the proposed estimators asymptotically follow a normal distribution, and the asymptotic variance can be estimated by the perturbation-resampling method. We further smooth the original discontinuous objective function by the kernel smoothing, and the resulting estimators will have the same asymptotic distribution as the original estimators. We propose a computationally stable and efficient procedure for the optimization, which addresses the challenge due to the non-convexity of the objective function. Numerical studies show that our method performs well under realistic settings. We apply the proposed method to a Parkinson Disease (PD) clinical trial data to examining risk factors associated with the global disease burden and/or the progression of PD.

In the third project, we propose a class conditional independence test to evaluate whether Y1 and Y2 are independent conditioning on covariates X. Our method relies on the modeling of the density functions of Y1|X and Y2|X, but the resulting test will be valid as long as one of the density functions is correctly specified. Under mild regularity conditions, our test statistic will be asymptotically normal, and the asymptotic variance can be consistently estimated. Compared with existing methods, our method is computationally efficient, as no bootstrap or hyper-parameter tuning procedure is required. We extend our method to infer the conditional independence graph, and propose a multiple testing procedure to control the false discovery rate. Numerical results suggest that our method performs well under a variety of settings and is robust to density function misspecifications. We apply the proposed method to a gastric cancer gene expression data to understand the associations between genes belonging to the transforming growth factor β signalling pathway.

Advisor

Li, Meng
Ning, Jing
Li, Ruosha

Degree

Doctor of Philosophy

Type

Thesis

Keywords

Competing risks, Conditional independence, Monotonic index model, Multivariate outcomes

Citation

Ding, Maomao. "Statistical Methods for Multivariate Outcomes with Applications to Biomedical Data." (2021) Diss., Rice University. https://hdl.handle.net/1911/111685.

Rights

Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.

Citable link to this page

https://hdl.handle.net/1911/111685

Collections

Rice University Theses and Dissertations

Full item page