Dimension reduction methods with applications to high dimensional data with a censored response

dc.contributor.advisorRojo, Javieren_US
dc.creatorNguyen, Tuan S.en_US
dc.date.accessioned2011-07-25T01:51:02Zen_US
dc.date.available2011-07-25T01:51:02Zen_US
dc.date.issued2010en_US
dc.description.abstractDimension reduction methods have come to the forefront of many applications where the number of covariates, p, far exceed the sample size, N. For example, in survival analysis studies using microarray gene expression data, 10--30K expressions per patient are collected, but only a few hundred patients are available for the study. The focus of this work is on linear dimension reduction methods. Attention is given to the dimension reduction method of Random Projection (RP), in which the original p-dimensional data matrix X is projected onto a k-dimensional subspace using a random matrix Gamma. The motivation of RP is the Johnson-Lindenstrauss (JL) Lemma, which states that a set of N points in p-dimensional Euclidean space can be projected onto a k ≥ 24lnN3e2-2e 3 dimensional Euclidean space such that the pairwise distances between the points are preserved within a factor 1 +/- epsilon. In this work, the JL Lemma is revisited when the random matrix Gamma is defined as standard Gaussian and Achlioptas-typed. An improvement on the lower bound for k is provided by working directly with the distributions of the random distances rather than resorting to the moment generating function technique used in the literature. An improvement on the lower bound for k is also provided when using pairwise L2 distances in the space of the original points and pairwise L 1 distances in the space of the projected points. Another popular dimension reduction method is Partial Least Squares. In this work, a variant of Partial Least Squares is proposed, denoted by Rank-based Modified Partial Least Squares (RMPLS). The weight vectors of RMPLS can be seen to be the solution to an optimization problem. The method is insensitive to outlying values of both the response and the covariates, and takes into account the censoring information in the construction of its weight vectors. Results from simulation and real datasets under the Cox and Accelerated Failure Time (AFT) models indicate that RMPLS outperforms other leading methods for various measures when outliers are present in the response, and is comparable to other methods in the absence of outliers in the response.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.callnoTHESIS STAT. 2010 NGUYENen_US
dc.identifier.citationNguyen, Tuan S.. "Dimension reduction methods with applications to high dimensional data with a censored response." (2010) Diss., Rice University. <a href="https://hdl.handle.net/1911/61967">https://hdl.handle.net/1911/61967</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/61967en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectStatisticsen_US
dc.subjectTheoretical mathematicsen_US
dc.titleDimension reduction methods with applications to high dimensional data with a censored responseen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentStatisticsen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3421400.pdf
Size:
4 MB
Format:
Adobe Portable Document Format