Dimension reduction methods with applications to high dimensional data with a censored response

dc.contributor.advisorRojo, Javier
dc.creatorNguyen, Tuan S.
dc.date.accessioned2011-07-25T01:51:02Z
dc.date.available2011-07-25T01:51:02Z
dc.date.issued2010
dc.description.abstractDimension reduction methods have come to the forefront of many applications where the number of covariates, p, far exceed the sample size, N. For example, in survival analysis studies using microarray gene expression data, 10--30K expressions per patient are collected, but only a few hundred patients are available for the study. The focus of this work is on linear dimension reduction methods. Attention is given to the dimension reduction method of Random Projection (RP), in which the original p-dimensional data matrix X is projected onto a k-dimensional subspace using a random matrix Gamma. The motivation of RP is the Johnson-Lindenstrauss (JL) Lemma, which states that a set of N points in p-dimensional Euclidean space can be projected onto a k ≥ 24lnN3e2-2e 3 dimensional Euclidean space such that the pairwise distances between the points are preserved within a factor 1 +/- epsilon. In this work, the JL Lemma is revisited when the random matrix Gamma is defined as standard Gaussian and Achlioptas-typed. An improvement on the lower bound for k is provided by working directly with the distributions of the random distances rather than resorting to the moment generating function technique used in the literature. An improvement on the lower bound for k is also provided when using pairwise L2 distances in the space of the original points and pairwise L 1 distances in the space of the projected points. Another popular dimension reduction method is Partial Least Squares. In this work, a variant of Partial Least Squares is proposed, denoted by Rank-based Modified Partial Least Squares (RMPLS). The weight vectors of RMPLS can be seen to be the solution to an optimization problem. The method is insensitive to outlying values of both the response and the covariates, and takes into account the censoring information in the construction of its weight vectors. Results from simulation and real datasets under the Cox and Accelerated Failure Time (AFT) models indicate that RMPLS outperforms other leading methods for various measures when outliers are present in the response, and is comparable to other methods in the absence of outliers in the response.
dc.format.mimetypeapplication/pdf
dc.identifier.callnoTHESIS STAT. 2010 NGUYEN
dc.identifier.citationNguyen, Tuan S.. "Dimension reduction methods with applications to high dimensional data with a censored response." (2010) Diss., Rice University. <a href="https://hdl.handle.net/1911/61967">https://hdl.handle.net/1911/61967</a>.
dc.identifier.urihttps://hdl.handle.net/1911/61967
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectStatistics
dc.subjectTheoretical mathematics
dc.titleDimension reduction methods with applications to high dimensional data with a censored response
dc.typeThesis
dc.type.materialText
thesis.degree.departmentStatistics
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3421400.pdf
Size:
4 MB
Format:
Adobe Portable Document Format