Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data

dc.citation.firstpage1en_US
dc.citation.journalTitleJournal of Machine Learning Researchen_US
dc.citation.lastpage73en_US
dc.citation.volumeNumber22en_US
dc.contributor.authorWang, Minjieen_US
dc.contributor.authorAllen, Genevera I.en_US
dc.date.accessioned2021-10-20T16:32:00Zen_US
dc.date.available2021-10-20T16:32:00Zen_US
dc.date.issued2021en_US
dc.description.abstractIn mixed multi-view data, multiple sets of diverse features are measured on the same set of samples. By integrating all available data sources, we seek to discover common group structure among the samples that may be hidden in individualistic cluster analyses of a single data view. While several techniques for such integrative clustering have been explored, we propose and develop a convex formalization that enjoys strong empirical performance and inherits the mathematical properties of increasingly popular convex clustering methods. Specifically, our Integrative Generalized Convex Clustering Optimization (iGecco) method employs different convex distances, losses, or divergences for each of the different data views with a joint convex fusion penalty that leads to common groups. Additionally, integrating mixed multi-view data is often challenging when each data source is high-dimensional. To perform feature selection in such scenarios, we develop an adaptive shifted group-lasso penalty that selects features by shrinking them towards their loss-specific centers. Our so-called iGecco+ approach selects features from each data view that are best for determining the groups, often leading to improved integrative clustering. To solve our problem, we develop a new type of generalized multi-block ADMM algorithm using sub-problem approximations that more efficiently fits our model for big data sets. Through a series of numerical experiments and real data examples on text mining and genomics, we show that iGecco+ achieves superior empirical performance for high-dimensional mixed multi-view data.en_US
dc.identifier.citationWang, Minjie and Allen, Genevera I.. "Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data." <i>Journal of Machine Learning Research,</i> 22, (2021) JMLR: 1-73. <a href="https://hdl.handle.net/1911/111581">https://hdl.handle.net/1911/111581</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/111581en_US
dc.language.isoengen_US
dc.publisherJMLRen_US
dc.relation.urihttps://jmlr.org/papers/v22/19-1012.htmlen_US
dc.rightsLicense: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/.en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.subject.keywordIntegrative clusteringen_US
dc.subject.keywordconvex clusteringen_US
dc.subject.keywordfeature selectionen_US
dc.subject.keywordconvex optimizationen_US
dc.subject.keywordsparse clusteringen_US
dc.subject.keywordGLM devianceen_US
dc.subject.keywordBregman divergencesen_US
dc.titleIntegrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Dataen_US
dc.typeJournal articleen_US
dc.type.dcmiTexten_US
dc.type.publicationpublisher versionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
19-1012.pdf
Size:
2.17 MB
Format:
Adobe Portable Document Format
Description: