Methods for High-Dimensional Inference in Genetic Association Studies for Complex Time-to-Event Data

Date
2024-04-16
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Modern genetic repositories such as the UK Biobank have recently released massive amounts of data, including genotypes and tens of thousands of outcomes collected across hundreds of thousands of subjects. This information holds great promise for identifying genetic variants that perturb the risks of varied complex diseases. However, a major challenge is that such databases commonly record data in interval-censored form. That is, the event time of an outcome is not observed exactly but is only known to fall within a certain interval. There is a pronounced lack of tools for genetic association analysis that can be applied to interval-censored outcomes, as large-scale genetic studies have historically been conducted on binary and continuous outcomes. This work seeks to fill the gap by providing tools to perform genetic association studies with interval-censored outcomes.

In the first chapter, we develop a test to associate sets of genetic variants with multiple correlated outcomes. This test leverages the pleiotropic nature of variants and the additional information provided by multiple outcomes to increase power for detecting weak genetic effects. We use a variance components testing framework to develop two robust tests – one that is more powerful when genetic effects in a set are homogeneous and one that is more powerful when the effects are heterogeneous. We then generalize these tests into an omnibus test that uses information from both individual approaches.

In the second chapter, we extend our work by investigating a variable selection framework that identifies the specific genetic variants responsible for causing association between a set of mutations and an outcome. Specifically, we employ Bayesian variable selection on interval-censored outcomes to fine-map variants within a risk locus. We discuss two prior specifications aimed at inducing sparsity and highlight the interpretability of the results.

In the third chapter, we present a case study demonstrating the practical application of these methodologies to real genetic repository data. Specifically, we first apply interval-censored set-based analysis to identify the genes associated with time to periodontitis and other oral diseases in both the St. Jude Lifetime Cohort Study and the UK Biobank. We then apply Bayesian variable selection to identify the specific causal variants within risk genes.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Statistical Genetics, Survival Analysis
Citation

Choi, Jaihee. Methods for High-Dimensional Inference in Genetic Association Studies for Complex Time-to-Event Data. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/116148

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page