Browsing by Author "Merenyi, Erzsebet"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item A neural relevance model for feature extraction from hyperspectral images, and its application in the wavelet domain(2006) Mendenhall, Michael J.; Merenyi, ErzsebetOur research is motivated by military applications related to aspects of contingency planning. Of recent interest is the identification of landmasses which can support the landing and takeoff of fixed wing and rotary aircraft where accurate classification of the surface cover is of utmost importance. In a supervised classification scenario, a natural question is whether a subset of the input features (spectral bands) could be used without degrading classification accuracy. Our interest in feature extraction is twofold. First, we desire a significantly reduced set of features by which we can compress the signal. Second, we desire to enhance classification performance by alleviating superfluous signal content. Feature extraction models based on PCA or wavelets judge feature importance by the magnitude of the transform coefficients, rarely leading to an appropriate set of features for classification. We analyze a recent neural paradigm, Generalized Relevance Learning Vector Quantization (GRLVQ) [1], to discover input dimensions relevant for classification. GRLVQ is based on, and substantially extends, Learning Vector Quantization (LVQ) [2] by learning relevant input dimensions while incorporating classification accuracy in the cost function. LVQ is the supervised version of Kohonen's unsupervised Self-Organizing Map [2]. LVQs iteratively adjust prototype vectors to define class boundaries while minimizing the Bayes risk. Our analysis reveals two major algorithmic deficiencies of GRLVQ. Fixing these deficiencies leads to improved convergence performance and classification accuracy. We call our unproved version GRLVQ-Improved (GRLVQI). By using only the relevant spectral channels discovered by GRLVQ, we show that one can produce as good or better classification accuracy as by using all spectral channels. We support this claim by running an independent classifier on the reduced feature set, using 23 classes of a real 194-band remotely sensed hyperspectral image. The higher the data dimension and/or larger the number of classes, the more advantage GRLVQI shows over GRLVQ. The improved performance of GRLVQI over GRLVQ is substantiated using several different methods discussed in the literature. We come to the important conclusion that the improved results obtained by our GRLVQI are statistically significant. A new and exciting feature extraction model is presented by applying GRLVQI in the wavelet domain. Our model is focused on classification requirements, rather than signal reconstruction. It does not follow the largest magnitude coefficient selection as is more typical in wavelet analysis. The most relevant wavelet features turn out to be something different. Further, it allows for a linearly selection of wavelet coefficients based on their computed relevances. We extend this work to complex wavelets in order to mitigate the effects of discontinuities introduced in the spectra due to the deletion of spectral bands containing irrecoverably corrupted data. The Dual-Tree Complex Wavelet Transform shows improved classification results with similar feature extraction capabilities as with the Critically Sampled Discrete Wavelet Transform. Our results demonstrate the superior classification and feature reduction performance of our relevance-wavelet model.Item Adaptive Similarity Measures for Material Identification in Hyperspectral Imagery(2013-09-16) Bue, Brian; Merenyi, Erzsebet; Jermaine, Christopher M.; Subramanian, Devika; Wagstaff, KiriRemotely-sensed hyperspectral imagery has become one the most advanced tools for analyzing the processes that shape the Earth and other planets. Effective, rapid analysis of high-volume, high-dimensional hyperspectral image data sets demands efficient, automated techniques to identify signatures of known materials in such imagery. In this thesis, we develop a framework for automatic material identification in hyperspectral imagery using adaptive similarity measures. We frame the material identification problem as a multiclass similarity-based classification problem, where our goal is to predict material labels for unlabeled target spectra based upon their similarities to source spectra with known material labels. As differences in capture conditions affect the spectral representations of materials, we divide the material identification problem into intra-domain (i.e., source and target spectra captured under identical conditions) and inter-domain (i.e., source and target spectra captured under different conditions) settings. The first component of this thesis develops adaptive similarity measures for intra-domain settings that measure the relevance of spectral features to the given classification task using small amounts of labeled data. We propose a technique based on multiclass Linear Discriminant Analysis (LDA) that combines several distinct similarity measures into a single hybrid measure capturing the strengths of each of the individual measures. We also provide a comparative survey of techniques for low-rank Mahalanobis metric learning, and demonstrate that regularized LDA yields competitive results to the state-of-the-art, at substantially lower computational cost. The second component of this thesis shifts the focus to inter-domain settings, and proposes a multiclass domain adaptation framework that reconciles systematic differences between spectra captured under similar, but not identical, conditions. Our framework computes a similarity-based mapping that captures structured, relative relationships between classes shared between source and target domains, allowing us apply a classifier trained using labeled source spectra to classify target spectra. We demonstrate improved domain adaptation accuracy in comparison to recently-proposed multitask learning and manifold alignment techniques in several case studies involving state-of-the-art synthetic and real-world hyperspectral imagery.Item Learning from SOM Voronoi Tessellations: Applications to Density Estimation and Clustering(2020-04-24) Taylor, Josh; Merenyi, ErzsebetThis work collectively demonstrates how a study of the geometry of the Voronoi tessellations induced by manifold learning via Self-Organizing Maps (SOMs) can inform the solution of two standard problems in unsupervised learning: density estimation and clustering. In Part I we introduce methodology based on SOMs for specifying locally adaptive bandwidths for non-parametric kernel density estimation. Bandwidth selection for this new estimator, dubbed VECS (Voronoi Ellipsoidal Coordinate Smoothing), exploits both the CADJ graph of SOM prototypes and the geometries of the Voronoi regions it connects; this helps VECS avoid the computational issues stemming from minimization of the traditional error functionals used in density estimation, making it suitable for use in higher dimensions. Combining VECS estimates with the organization of the SOM output lattice offers a new visualization for high dimensional exploratory data analysis called VECSvis. In Part II we develop DMPrune, an automated method for removing edges of a CADJ graph based on a Dirichlet-Multinomial model of its edge weights. As a topology representing graph, CADJ guides the most sophisticated SOM-based clusterings. DMPrune offers intelligent graph pruning by assigning a score to each CADJ edge and, consulting their likelihood, monitors the impacts of edge removal to maximize edge sparsity while minimizing loss of cohesion in the most important parts of the graph. This brings us closer to high quality, automated cluster extraction from a learned SOM. For exposition, we exercise both methodologies on MER, a 7-band multispectral image of the Columbia Hills region of the Martian landscape imaged by the Mars Exploratoration Rover Spirit. Automated clusterings of the image spectra parameterized by DMPrune are shown to be of comparable quality to a previous, scientifically verified, MER clustering guided by human interaction. Additional structure appearing from these automated clusterings appears valid when compared to the VECSvis of this image.Item Learning the Structure of High-Dimensional Manifolds with Self-Organizing Maps for Accurate Information Extraction(2011) Zhang, Lili; Merenyi, ErzsebetThis work aims to improve the capability of accurate information extraction from high-dimensional data, with a specific neural learning paradigm, the Self-Organizing Map (SOM). The SOM is an unsupervised learning algorithm that can faithfully sense the manifold structure and support supervised learning of relevant information from the data. Yet open problems regarding SOM learning exist. We focus on the following two issues. (1) Evaluation of topology preservation. Topology preservation is essential for SOMs in faithful representation of manifold structure. However, in reality, topology violations are not unusual, especially when the data have complicated structure. Measures capable of accurately quantifying and informatively expressing topology violations are lacking. One contribution of this work is a new measure, the Weighted Differential Topographic Function ( WDTF ), which differentiates an existing measure, the Topographic Function ( TF ), and incorporates detailed data distribution as an importance weighting of violations to distinguish severe violations from insignificant ones. Another contribution is an interactive visual tool, TopoView, which facilitates the visual inspection of violations on the SOM lattice. We show the effectiveness of the combined use of the WDTF and TopoView through a simple two-dimensional data set and two hyperspectral images. (2) Learning multiple latent variables from high-dimensional data. We use an existing two-layer SOM-hybrid supervised architecture, which captures the manifold structure in its SOM hidden layer, and then, uses its output layer to perform the supervised learning of latent variables. In the customary way, the output layer only uses the strongest output of the SOM neurons. This severely limits the learning capability. We allow multiple, k , strongest responses of the SOM neurons for the supervised learning. Moreover, the fact that different latent variables can be best learned with different values of k motivates a new neural architecture, the Conjoined Twins, which extends the existing architecture with additional copies of the output layer, for preferential use of different values of k in the learning of different latent variables. We also automate the customization of k for different variables with the statistics derived from the SOM. The Conjoined Twins shows its effectiveness in the inference of two physical parameters from Near-Infrared spectra of planetary ices.Item Learning the Structure of High-Dimensional Manifolds with Self-Organizing Maps for Accurate Information Extraction(Rice University, 2011) Zhang, Lili; Merenyi, ErzsebetThis work aims to improve the capability of accurate information extraction from high-dimensional data, with a specific neural learning paradigm, the Self-Organizing Map (SOM). The SOM is an unsupervised learning algorithm that can faithfully sense the manifold structure and support supervised learning of relevant information from the data. Yet open problems regarding SOM learning exist. We focus on the following two issues. 1. Evaluation of topology preservation. Topology preservation is essential for SOMs in faithful representation of manifold structure. However, in reality, topology violations are not unusual, especially when the data have complicated structure. Measures capable of accurately quantifying and informatively expressing topology violations are lacking. One contribution of this work is a new measure, the Weighted Differential Topographic Function (WDTF), which differentiates an existing measure, the Topographic Function (TF), and incorporates detailed data distribution as an importance weighting of violations to distinguish severe violations from insignificant ones. Another contribution is an interactive visual tool, TopoView, which facilitates the visual inspection of violations on the SOM lattice. We show the effectiveness of the combined use of the WDTF and TopoView through a simple two-dimensional data set and two hyperspectral images. 2. Learning multiple latent variables from high-dimensional data. We use an existing two-layer SOM-hybrid supervised architecture, which captures the manifold structure in its SOM hidden layer, and then, uses its output layer to perform the supervised learning of latent variables. In the customary way, the output layer only uses the strongest output of the SOM neurons. This severely limits the learning capability. We allow multiple, k, strongest responses of the SOM neurons for the supervised learning. Moreover, the fact that different latent variables can be best learned with different values of k motivates a new neural architecture, the Conjoined Twins, which extends the existing architecture with additional copies of the output layer, for preferential use of different values of k in the learning of different latent variables. We also automate the customization of k for different variables with the statistics derived from the SOM. The Conjoined Twins shows its effectiveness in the inference of two physical parameters from Near-Infrared spectra of planetary ices.Item Multi-Modal Imaging Techniques for Early Cancer Diagnostics(2012-09-05) Bedard, Noah; Tkaczyk, Tomasz S.; Richards-Kortum, Rebecca Rae; Merenyi, Erzsebet; Gillenwater, Ann M.Cancer kills more Americans under the age of 75 than any other disease. Although most cancers occur in epithelial surfaces that can be directly visualized, the majority of cases are detected at an advanced stage. Optical imaging and spectroscopy may provide a solution to the need for non-invasive and effective early detection tools. These technologies are capable of examining tissue over a wide range of spatial scales, with widefield macroscopic imaging typically spanning several square-centimeters, and high resolution in vivo microscopy techniques enabling cellular and subcellular features to be visualized. This work presents novel technologies in two important areas of optical imaging: high resolution imaging and widefield imaging. For subcellular imaging applications, new high resolution endomicroscope techniques are presented with improved lateral resolution, larger field-of-view, increased contrast, decreased background signal, and reduced cost compared to existing devices. A new widefield optical technology called multi-modal spectral imaging is also developed. This technique provides real-time in vivo spectral data over a large field-of-view, which is useful for detecting biochemical alterations associated with neoplasia. The described devices are compared to existing technologies, tested using ex vivo tissue specimens, and evaluated for diagnostic potential in a multi-patient oral cancer clinical trial.Item The use of data topology in unsupervised clustering of high-dimensional data with self -organizing maps(2008) Tasdemir, Kadim; Merenyi, ErzsebetHigh-dimensional data is increasingly becoming common because of its rich information content that can provide comprehensive characterization of objects (patterns) in real world situations. Unsupervised clustering aims to utilize this rich information content for detailed discovery of distinct patterns. However, conventional clustering methods may be inadequate for capturing intricate structure in high-dimensional and large data, such as hyperspectral images or genetic microarray data. These data usually have many meaningful clusters, including interesting rare ones, whose discovery may be of great importance. Yet, faithful delineation of clusters may be impossible and rare clusters may be undiscovered due to limitations of clustering methods. A powerful method in high-dimensional data analysis is the Self-Organizing Map (SOM) [1]. An SOM is a neural learning algorithm that quantizes data spaces and spatially orders the quantization prototypes on a rigid lattice. The information learned by the SOM can be exploited to extract detailed cluster structure either by explanatory visualization or by clustering the SOM prototypes. Available SOM visualization or clustering schemes that are successful for relatively simple data often miss the finer structure in high-dimensional and large data. Our goal is to provide advanced visualization and clustering schemes for SOMs for detailed cluster extraction. The main contribution is the exploitation of the data topology inherent in the SOM's knowledge but largely underutilized in existing approaches. We achieve this by proposing a “connectivity matrix” CONN , which is a weighted Delaunay triangulation. CONN and its specific rendering on the SOM (CONNvis) help detailed delineation of clusters which can be obscure in existing schemes. The capability of CONNvis in cluster extraction inspires a new index for the evaluation of cluster validity. The proposed index, Conn_Index , is shown to be effective in various applications of synthetic and real data sets. Based on our experiences, we expect CONN and Conn_Index to help produce an automated clustering of the SOM which may be as detailed as can be achieved with the interactive methods including our CONNvis clustering. This will be a significant achievement for structure discovery given that automated schemes in previous works produce results inferior to results from semi-manual procedures.Item Using Self-Organizing Maps to discover functional relationships of brain areas from fMRI images(2014-04-23) O'Driscoll, Patrick; Merenyi, Erzsebet; Kelly, Kevin F.; Robinson, Jacob T.; Grossman, Robert; Karmonik, ChristofThis thesis combines a Conscious Self-Organizing Map (SOM) with an interactive clustering method to analyze functional Magnetic Resonance Imaging (fMRI) data to produce improved brain maps compared to maps produced at The Methodist Hospital and in the literature focusing on similar problems. My new maps exhibit an increased level of symmetry, contiguity, coincidence with functional region, and more complete mapping of functional regions. The examined fMRI data contains brain activations of a subject repeatedly executing willed motion in response to a visual stimulus. Clustering the data from this experiment first determines the optimal preprocessing steps for cluster extraction, and second proves that the Conscious SOM provides a valid brain map that identifies interacting brain regions during the sequence of willed motion. I determined that the geometric rectification, motion correction, temporal smoothing, and normalization preprocessing steps facilitate the best clustering.