Billion-Scale Similarity Search with GPUs

Publisher: IEEE

Abstract:
Similarity search finds application in database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data parallel tasks such as distance computation, prior approaches in this domain are bottlenecked by algorithms that expose less parallelism, such as k -min selection, or make poor use of the memory hierarchy. We propose a novel design for k -selection. We apply it in different similarity search scenarios, by optimizing brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation operates at up to 55 percent of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5 × faster than prior GPU state of the art. It enables the construction of a high accuracy k -NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.
Published in: IEEE Transactions on Big Data ( Volume: 7, Issue: 3, 01 July 2021)
Page(s): 535 - 547
Date of Publication: 10 June 2019
ISSN Information:
Publisher: IEEE

1 Introduction

Images and videos constitute a new massive source of data for indexing and search. Traditional media management systems are based on relational databases built on structured data. For example, an image is indexed by metadata like capture time and location, with possible manual additions like the names of people represented within. Images can thus be queried by name, date or location. This metadata can make it possible to automatically organize photo albums.

References

References is not available for this document.