The use of data topology in unsupervised clustering of high-dimensional data with self -organizing maps

dc.contributor.advisorMerenyi, Erzsebeten_US
dc.creatorTasdemir, Kadimen_US
dc.date.accessioned2018-12-03T18:31:12Zen_US
dc.date.available2018-12-03T18:31:12Zen_US
dc.date.issued2008en_US
dc.description.abstractHigh-dimensional data is increasingly becoming common because of its rich information content that can provide comprehensive characterization of objects (patterns) in real world situations. Unsupervised clustering aims to utilize this rich information content for detailed discovery of distinct patterns. However, conventional clustering methods may be inadequate for capturing intricate structure in high-dimensional and large data, such as hyperspectral images or genetic microarray data. These data usually have many meaningful clusters, including interesting rare ones, whose discovery may be of great importance. Yet, faithful delineation of clusters may be impossible and rare clusters may be undiscovered due to limitations of clustering methods. A powerful method in high-dimensional data analysis is the Self-Organizing Map (SOM) [1]. An SOM is a neural learning algorithm that quantizes data spaces and spatially orders the quantization prototypes on a rigid lattice. The information learned by the SOM can be exploited to extract detailed cluster structure either by explanatory visualization or by clustering the SOM prototypes. Available SOM visualization or clustering schemes that are successful for relatively simple data often miss the finer structure in high-dimensional and large data. Our goal is to provide advanced visualization and clustering schemes for SOMs for detailed cluster extraction. The main contribution is the exploitation of the data topology inherent in the SOM's knowledge but largely underutilized in existing approaches. We achieve this by proposing a “connectivity matrix” CONN , which is a weighted Delaunay triangulation. CONN and its specific rendering on the SOM (CONNvis) help detailed delineation of clusters which can be obscure in existing schemes. The capability of CONNvis in cluster extraction inspires a new index for the evaluation of cluster validity. The proposed index, Conn_Index , is shown to be effective in various applications of synthetic and real data sets. Based on our experiences, we expect CONN and Conn_Index to help produce an automated clustering of the SOM which may be as detailed as can be achieved with the interactive methods including our CONNvis clustering. This will be a significant achievement for structure discovery given that automated schemes in previous works produce results inferior to results from semi-manual procedures.en_US
dc.format.extent149 ppen_US
dc.identifier.callnoTHESIS E.E. 2009 TASDEMIRen_US
dc.identifier.citationTasdemir, Kadim. "The use of data topology in unsupervised clustering of high-dimensional data with self -organizing maps." (2008) Diss., Rice University. <a href="https://hdl.handle.net/1911/103530">https://hdl.handle.net/1911/103530</a>.en_US
dc.identifier.digital304507680en_US
dc.identifier.urihttps://hdl.handle.net/1911/103530en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectElectrical engineeringen_US
dc.subjectArtificial intelligenceen_US
dc.subjectComputer scienceen_US
dc.subjectApplied sciencesen_US
dc.subjectClustering Data miningen_US
dc.subjectData topologyen_US
dc.subjectKnowledge discoveryen_US
dc.subjectSelf-organizing mapsen_US
dc.subjectVisualizationen_US
dc.titleThe use of data topology in unsupervised clustering of high-dimensional data with self -organizing mapsen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentElectrical Engineeringen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
304507680.pdf
Size:
2.97 MB
Format:
Adobe Portable Document Format