Gaussian mixture regression and classification

dc.contributor.advisorScott, David W.
dc.creatorSung, Hsi Guang
dc.date.accessioned2009-06-04T07:02:33Z
dc.date.available2009-06-04T07:02:33Z
dc.date.issued2004
dc.description.abstractThe sparsity of high dimensional data space renders standard nonparametric methods ineffective for multivariate data. A new procedure, Gaussian Mixture Regression (GMR), is developed for multivariate nonlinear regression modeling. GMR has the tight structure of a parametric model, yet still retains the flexibility of a nonparametric method. The key idea of GMR is to construct a sequence of Gaussian mixture models for the joint density of the data, and then derive conditional density and regression functions from each model. Assuming the data are a random sample from the joint pdf fX,Y, we fit a Gaussian kernel density model fˆX,Y and then implement a multivariate extension of the Iterative Pairwise Replacement Algorithm (IPRA) to simplify the initial kernel density. IPRA generates a sequence of Gaussian mixture density models indexed by the number of mixture components K. The corresponding regression function of each density model forms a sequence of regression models which covers a spectrum of regression models of varying flexibility, ranging from approximately the classical linear model (K = 1) to the nonparametric kernel regression estimator (K = n). We use mean squared error and prediction error for selecting K. For binary responses, we extend GMR to fit nonparametric logistic regression models. Applying IPRA for each class density, we obtain two families of mixture density models. The logistic function can then be estimated by the ratio between pairs of members from each family. The result is a family of logistic models indexed by the number of mixtures in each density model. We call this procedure Gaussian Mixture Classification (GMC). For a given GMR or GMC model, forward and backward projection algorithms are implemented to locate the optimal subspaces that minimize information loss. They serve as the model-based dimension reduction techniques for GMR and GMC. In practice, GMR and GMC offer data analysts a systematic way to determine the appropriate level of model flexibility by choosing the number of components for modeling the underlying pdf. GMC can serve as an alternative or a complement to Mixture Discriminant Analysis (MDA). The uses of GMR and GMC are demonstrated in simulated and real data.
dc.format.extent157 p.en_US
dc.format.mimetypeapplication/pdf
dc.identifier.callnoTHESIS STAT. 2004 SUNG
dc.identifier.citationSung, Hsi Guang. "Gaussian mixture regression and classification." (2004) Diss., Rice University. <a href="https://hdl.handle.net/1911/18710">https://hdl.handle.net/1911/18710</a>.
dc.identifier.urihttps://hdl.handle.net/1911/18710
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectStatistics
dc.titleGaussian mixture regression and classification
dc.typeThesis
dc.type.materialText
thesis.degree.departmentStatistics
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3122551.PDF
Size:
8.73 MB
Format:
Adobe Portable Document Format