Long-Context Sequence Models for Image Retrieval
dc.contributor.advisor | Ordóñez-Román, Vicente | en_US |
dc.creator | Xiao, Zilin | en_US |
dc.date.accessioned | 2025-01-16T20:48:28Z | en_US |
dc.date.available | 2025-01-16T20:48:28Z | en_US |
dc.date.created | 2024-12 | en_US |
dc.date.issued | 2024-10-25 | en_US |
dc.date.submitted | December 2024 | en_US |
dc.date.updated | 2025-01-16T20:48:28Z | en_US |
dc.description.abstract | Image retrieval is an important problem in computer vision with many applications. In general, retrieval is usually cast as a metric learning problem where a model is trained under a distance or similarity objective to compare pairs of inputs. In this thesis, we introduce Extractive Image Re-ranker, a solution that takes as input local features corresponding to an image query and a group of gallery images, and outputs a refined ranking list through a single forward pass. This model can be used for image retrieval where typically a query image is compared to a large database of images using global features, and then a retrieved gallery of images is re-ranked based on more refined local features. ExtReranker formulates the re-ranking problem as a span extraction task analogous to the text span extraction problem in natural language processing. In contrast to pair-wise correspondence learning, our approach leverages long-context sequence models to effectively capture the list-wise dependencies between query and gallery images at the local-feature level. Our approach achieves superior performance compared with other re-rankers on established image retrieval benchmarks (CUB-200, SOP, and In-Shop). ExtReranker also achieves state-of-the-art re-ranking performance to alternative methods on ROxford and RParis while using 10X fewer local descriptors and having 5X lower forward latency. | en_US |
dc.format.mimetype | application/pdf | en_US |
dc.identifier.uri | https://hdl.handle.net/1911/118198 | en_US |
dc.language.iso | en | en_US |
dc.subject | image retrieval | en_US |
dc.subject | long-context language models | en_US |
dc.title | Long-Context Sequence Models for Image Retrieval | en_US |
dc.type | Thesis | en_US |
dc.type.material | Text | en_US |
thesis.degree.department | Computer Science | en_US |
thesis.degree.discipline | Computer Science | en_US |
thesis.degree.grantor | Rice University | en_US |
thesis.degree.level | Masters | en_US |
thesis.degree.name | Master of Science | en_US |
Files
Original bundle
1 - 1 of 1