A best-match approach for gene set analyses in embedding spaces

Li, Lechuan; Dannenfelser, Ruth; Cruz, Charlie; Yao, Vicky

A best-match approach for gene set analyses in embedding spaces

dc.citation.firstpage	1421	en_US
dc.citation.issueNumber	9	en_US
dc.citation.journalTitle	Genome Research	en_US
dc.citation.lastpage	1433	en_US
dc.citation.volumeNumber	34	en_US
dc.contributor.author	Li, Lechuan	en_US
dc.contributor.author	Dannenfelser, Ruth	en_US
dc.contributor.author	Cruz, Charlie	en_US
dc.contributor.author	Yao, Vicky	en_US
dc.date.accessioned	2024-10-29T14:11:22Z	en_US
dc.date.available	2024-10-29T14:11:22Z	en_US
dc.date.issued	2024	en_US
dc.description.abstract	Embedding methods have emerged as a valuable class of approaches for distilling essential information from complex high-dimensional data into more accessible lower-dimensional spaces. Applications of embedding methods to biological data have demonstrated that gene embeddings can effectively capture physical, structural, and functional relationships between genes. However, this utility has been primarily realized by using gene embeddings for downstream machine-learning tasks. Much less has been done to examine the embeddings directly, especially analyses of gene sets in embedding spaces. Here, we propose an Algorithm for Network Data Embedding and Similarity (ANDES), a novel best-match approach that can be used with existing gene embeddings to compare gene sets while reconciling gene set diversity. This intuitive method has important downstream implications for improving the utility of embedding spaces for various tasks. Specifically, we show how ANDES, when applied to different gene embeddings encoding protein–protein interactions, can be used as a novel overrepresentation- and rank-based gene set enrichment analysis method that achieves state-of-the-art performance. Additionally, ANDES can use multiorganism joint gene embeddings to facilitate functional knowledge transfer across organisms, allowing for phenotype mapping across model systems. Our flexible, straightforward best-match methodology can be extended to other embedding spaces with diverse community structures between set elements.	en_US
dc.identifier.citation	Li, L., Dannenfelser, R., Cruz, C., & Yao, V. (2024). A best-match approach for gene set analyses in embedding spaces. Genome Research, 34(9), 1421–1433. https://doi.org/10.1101/gr.279141.124	en_US
dc.identifier.digital	GenomeRes-2024-Li-1421-33	en_US
dc.identifier.doi	https://doi.org/10.1101/gr.279141.124	en_US
dc.identifier.uri	https://hdl.handle.net/1911/117955	en_US
dc.language.iso	eng	en_US
dc.publisher	Cold Spring Harbor Laboratory Press	en_US
dc.rights	Except where otherwise noted, this work is licensed under a Creative Commons Attribution-NonCommercial (CC BY-NC) license. Permission to reuse, publish, or reproduce the work beyond the terms of the license or beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc/4.0/	en_US
dc.title	A best-match approach for gene set analyses in embedding spaces	en_US
dc.type	Journal article	en_US
dc.type.dcmi	Text	en_US
dc.type.publication	publisher version	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: GenomeRes-2024-Li-1421-33.pdf
Size:: 2.73 MB
Format:: Adobe Portable Document Format

Download

Collections

Faculty Publications
Computer Science Publications