Introduction
Constantly growing amount of the available information resources has led to high demand in scalable and efficient similarity search data structures. One of the generally used approaches for information search is the K-Nearest Neighbor Search (K-NNS). The K-NNS assumes you have a defined distance function between the data elements and aims at finding the
Exact solutions for K-NNS [3], [4], [5] may offer a substantial search speedup only in case of relatively low dimensional data due to “curse of dimensionality”. To overcome this problem a concept of Approximate Nearest Neighbors Search (K-ANNS) was proposed, which relaxes the condition of the exact search by allowing a small number of errors. The quality of an inexact search (the recall) is defined as the ratio between the number of found true nearest neighbors and
In this paper we propose the Hierarchical Navigable Small World (Hierarchical NSW, HNSW), a new fully graph based incremental K-ANNS structure, which can offer a much better logarithmic complexity scaling. The main contributions are: explicit selection of the graph's enter-point node, separation of links by different scales and use of an advanced heuristic to select the neighbors. Alternatively, Hierarchical NSW algorithm can be seen as an extension of the probabilistic skip list structure [27] with proximity graphs instead of the linked lists. Performance evaluation has demonstrated that the proposed general metric space method is able to strongly outperform previous opensource state-of-the-art approaches suitable only for vector spaces.
Related Works
2.1 Proximity Graph Techniques
In the vast majority of studied graph algorithms, searching takes a form of greedy routing in k-Nearest Neighbor (k-NN) graphs [10], [18], [19], [20], [21], [22], [23], [24], [25], [26]. For a given proximity graph, we start the search at some enter-point (it can be random or supplied by a separate algorithm) and iteratively traverse the graph. At each step of the traversal, the algorithm examines the distances from a query to the neighbors of a current base node and then selects as the next base node the adjacent node that minimizes the distance, while constantly keeping track of the best-discovered neighbors. The search is terminated when some stopping condition is met (e.g., the number of distance calculations). Links to the closest neighbors in a k-NN graph serve as a simple approximation of the Delaunay graph [25], [26] (a graph which guarantees that the result of a basic greedy graph traversal is always the nearest neighbor). Unfortunately, Delaunay graph cannot be efficiently constructed without prior information about the structure of the space [4], but its approximation by the nearest neighbors can be done by using only the distances between the stored elements. It was shown that proximity graph approaches with such approximation perform competitively to other k-ANNS techniques, such as kd-trees or LSH [18], [19], [20], [21], [22], [23], [24], [25], [26].
The main drawbacks of the k-NN graph approaches are: 1) the power law scaling of the number of steps with the dataset size during the routing process [28], [29]; 2) a possible loss of global connectivity which leads to poor search results on clustered data. To overcome these problems many hybrid approaches have been proposed that use auxiliary algorithms applicable only for vector data (such as kd-trees [18], [19] and product quantization [10]) to find better candidates for the enter-point nodes by doing a coarse search.
In [25], [26], [30] authors proposed a proximity graph K-ANNS algorithm called Navigable Small World (NSW, also known as Metricized Small World, MSW), which utilized navigable graphs, i.e., graphs with logarithmic or polylogarithmic scaling of the number of hops during the greedy traversal with the respect of the network size [31], [32]. An NSW graph is constructed via consecutive insertion of elements in random order by bidirectionally connecting them to the
The construction phase of the NSW structure can be efficiently parallelized without global synchronization and without measurable effect on accuracy [26], making NSW a good choice for distributed search systems. The NSW approach delivered the state-of-the-art performance on some datasets [33], [34], however, due to the overall polylogarithmic complexity scaling, the algorithm was still prone to severe performance degradation on low dimensional datasets (on which NSW could lose to tree-based algorithms by several orders of magnitude [34]).
2.2 Navigable Small World Models
Networks with logarithmic or polylogarithmic scaling of the greedy graph routing are known as the navigable small world networks [31], [32]. Such networks are an important topic of complex network theory aiming at understanding of the underlying mechanisms of real-life networks formation in order to apply them for applications of scalable routing [32], [35], [36] and distributed similarity search [25], [26], [30], [37], [38], [39], [40].
The first works to consider spatial models of navigable networks were done by J. Kleinberg [31], [41] as social network models for the famous Milgram experiment [42]. Kleinberg studied a variant of random Watts-Strogatz networks [43] using a regular lattice graph in d-dimensional vector space with an augmentation of long-range links following a specific long link length distribution
Another well-known class of navigable networks is the scale-free models [32], [35], [36], which can reproduce several features of real-life networks and are advertised for routing applications [35]. However, networks produced by such models have even worse power law complexity scaling of the greedy search [44] and, just like the Kleinberg's model, scale-free models require global knowledge of the data distribution, making them unusable for search applications.
The above-described NSW algorithm uses a simpler, previously unknown model of navigable networks, allowing decentralized graph construction and suitable for data in arbitrary spaces. It was suggested [44] that the NSW network formation mechanism may be responsible for navigability of large-scale biological neural networks (presence of which is disputable): similar models were able to describe the growth of small brain networks, while the model predicts several high-level features observed in large-scale neural networks. However, the NSW model also suffers from the polylogarithmic search complexity of the routing process.
Motivation
The ways of improving the NSW search complexity can be identified through the analysis of the routing process, which was studied in detail in [32], [44]. The routing can be divided into two phases: “zoom-out” and “zoom-in” [32]. The greedy algorithm starts in the “zoom-out” phase from a low degree node and traverses the graph simultaneously increasing the node degree until the characteristic radius of the node links length reaches the scale of the distance to the query. Before the latter happens, the average degree of a node can stay relatively small, which leads to an increased probability of being stuck in a distant false local minimum.
One can avoid the described problem in NSW by starting the search from a node with the maximum degree (good candidates are the first nodes inserted in the NSW structure [44]), directly going to the “zoom-in” phase of the search. Tests show that setting the first nodes as enter-points substantially increases the probability of successful routing in the structure and provides significantly better performance on low dimensional data. However, it still has only polylogarithmic complexity scalability of a single greedy search at best and performs worse on high dimensional data compared to Hierarchical NSW.
The reason for the polylogarithmic complexity scaling of a single greedy search in NSW is that the overall number of distance computations is roughly proportional to a product of the average number of greedy algorithm hops by the average degree of the nodes on the greedy path. The average number of hops scales logarithmically [26], [44], while the average degree of the nodes on the greedy path also scales logarithmically due to the following facts: 1) the greedy search tends to go through the same hubs as the network grows [32], [44]; 2) the average number of hub connections grows logarithmically with an increase of the network size. Thus we get an overall polylogarithmic dependence of the resulting complexity.
The idea of Hierarchical NSW algorithm is to separate the links according to their length into different layers and then search the multilayer graph. In this case, we can evaluate only a fixed number of the connections for each element independently of the network size, thus allowing logarithmic scalability. In such structure, the search starts from the upper layer which has only the longest links (the “zoom-in” phase). The algorithm greedily traverses the upper layer until a local minimum is reached (see Fig. 1 for illustration). After that, the search switches to the lower layer (which has shorter links), restarts from the element which was the local minimum in the previous layer and the process repeats. The maximum number of connections per element in all layers can be made constant, thus allowing a logarithmic complexity scaling of routing in a navigable small world network.
Illustration of the Hierarchical NSW idea. The search starts from an element from the top layer (shown red). Red arrows show direction of the greedy algorithm from the entry point to the query (shown green).
One way to form such a layered structure is to explicitly set links with different length scales by introducing layers. For every element we select an integer level
In case we combine the elements’ connections from all layers, the structure becomes similar to the NSW graph (in this case
The Hierarchical NSW idea is also very similar to a well-known 1D probabilistic skip list structure [27] and can be described using its terms. The major difference to skip list is that we generalize the structure by replacing linked lists with proximity graphs. The Hierarchical NSW approach thus can utilize the same methods for making the distributed approximate search/overlay structures [45].
To select the proximity graph connections during the element insertion we utilize a heuristic that takes into account the distances between the candidate elements to create diverse connections (a similar algorithm was used in the spatial approximation tree [4] to select the tree children) instead of just selecting the closest neighbors. The heuristic examines the candidates starting from the nearest (with respect to the inserted element) and creates a connection to a candidate only if it is closer to the base (inserted) element compared to any of the already connected candidates (see Section 4 for the details).
When the number of candidates is large enough the heuristic allows getting the exact relative neighborhood graph [46] as a subgraph, a minimal subgraph of the Delaunay graph deducible by using only the distances between the nodes. The relative neighborhood graph allows easily keeping the global connected component, even in case of highly clustered data (see Fig. 2 for illustration). Note that the heuristic creates extra edges compared to the exact relative neighborhood graphs, allowing controlling the number of the connections which is important for search performance. For the case of 1D data the heuristic allows getting the exact Delaunay subgraph (which in this case coincides with the relative neighborhood graph) by using only information about the distances between the elements, thus making a direct transition from Hierarchical NSW to the 1D probabilistic skip list algorithm.
Illustration of the heuristic used to select the graph neighbors for two isolated clusters. A new element is inserted on the boundary of Cluster 1. All of the closest neighbors of the element belong to the Cluster 1, thus missing the edges of Delaunay graph between the clusters. The heuristic, however, selects element
A base variant of the Hierarchical NSW proximity graphs was also used in [18] (called ‘sparse neighborhood graphs’) for proximity graph searching. A similar heuristic was also a focus of the FANNG algorithm [47] (published shortly after the first versions of the current manuscript were posted online) with a slightly different interpretation, based on the sparse neighborhood graph's property of the exact routing [18].
Algorithm Description
Network construction (Algorithm 1) is organized via consecutive insertions of the stored elements into the graph structure. For every inserted element an integer maximum layer
The first phase of the insertion process starts from the top layer by greedily traversing the graph in order to find the
INSERT(
Input: multilayer graph hnsw, new element q, number of established connections M, maximum number of connections for each element per layer Mmax, size of the dynamic candidate list efConstruction, normalization factor for level generation
Output: update hnsw inserting element
for
ep ← get the nearest element from W to q
for lc ← min(L, l) … 0
W ← SEARCH-LAYER(q, ep, efConstruction, lc)
add bidirectionall connectionts from neighbors to q at layer lc
for each e ∈ neighbors // shrink connections if needed
eConn ← neighbourhood(e) at layer lc
if │eConn│ > Mmax // shrink connections of e
// if lc = 0 then
eNewConn ← SELECT-NEIGHBORS(e, eConn, Mmax, lc)
set neighbourhood(e) at layer lc to eNewConn
if
set enter-point for hnsw to q
SEARCH-LAYER(q, ep, ef, lc)
Input: query element q, enter-points ep, number of nearest to q elements to return ef, layer number lc
Output: ef closest neighbors to q
v ← ep // set of visited elements
C ← ep // set of candidates
W ← ep // dynamic list of found nearest neighbors
while
c ← extract nearest element from C to q
f ← get furthest element from W to q
if distance(c, q) > distance(f, q)
break // all elements in W are evaluated
for each e ∈ neighbourhood(c) at layer lc // update C and W
if
f ← get furthest element from W to q
if distance(e, q) < distance(f, q) or │W│ < ef
if
remove furthest element from W to q
return W
When the search reaches the layer equal to or less than
Two methods for selecting the best
SELECT-NEIGHBORS-SIMPLE(q, C, M)
Input: base element q, candidate elements C, number of neighbors to return M
Output: M nearest elements to q
return M nearest elements from C to q
SELECT-NEIGHBORS-HEURISTIC(q, C, M, lc, extendCandidates, keepPrunedConnections)
Input: base element q, candidate elements C, number of neighbors to return M, layer number lc, flag indicating whether or not to extend candidate list extendCandidates, flag indicating whether or not to add discarded elements keepPrunedConnections
Output: M elements selected by the heuristic
W ← C // working queue for the candidates
if extendCandidates // extend candidates by their neighbors
for each e ∈ C
for each eadj ∈ neighbourhood(e) at layer lc
if eadj ∉ W
while
e ← extract nearest element from W to q
if e is closer to q compared to any element from R
R ← R ⋃ e
else
if keepPrunedConnections // add some of the discarded// connections from Wd
while │Wd│> 0 and │R│< M
return R
The insertion procedure terminates when the connections of the inserted elements are established on the zero layer.
The K-ANNS search algorithm used in Hierarchical NSW is presented in Algorithm 5. It is roughly equivalent to the insertion algorithm for an item with layer
K-NN-SEARCH(hnsw, q, K, ef)
Input: multilayer graph hnsw, query element q, number of nearest neighbors to return K, size of the dynamic candidate list ef
Output: K nearest elements to q
W ← ∅ // set for the current nearest elements
ep ← get enter-point for hnsw
L ← level of ep // top layer for hnsw
for lc ← L … 1
W ← SEARCH-LAYER(q, ep, ef = 1, lc)
ep ← get nearest element from W to q
W ← SEARCH-LAYER(q, ep, ef, lc = 0)
return K nearest elements from W to q
4.1 Influence of the Construction Parameters
Algorithm construction parameters
To achieve the optimum performance advantage of the controllable hierarchy, the overlap between neighbors on different layers (i.e., the fraction of element's neighbors that also belong to other layers) has to be small. In order to decrease the overlap, we need to decrease the
A simple choice for the optimal
Plots for query time versus
Plots for query time versus
Plots for query time versus
Selection of the
Plots for query time versus
In all of the considered cases, use of the heuristic for proximity graph neighbors selection (Algorithm 4) leads to a higher or similar search performance compared to the naïve connection to the nearest neighbors (Algorithm 3). The effect is the most prominent for low dimensional data, at high recall for mid-dimensional data and for the case of highly clustered data (ideologically discontinuity can be regarded as a local low dimensional feature), see the comparison in Fig. 7 (Core i5 2400 CPU). When using the closest neighbors as connections for the proximity graph, the Hierarchical NSW algorithm fails to achieve a high recall for the clustered data because the search stucks at the clusters’ boundaries. Contrary, when the heuristic is used (together with candidates’ extension, line 3 in Algorithm 4), clustering leads to even higher performance. For uniform and very high dimensional data, there is a little difference between the neighbors selecting methods (see Fig. 4).
The only meaningful construction parameter left for the user is
Plots for recall error versus query time for different parameters of
Selection of the efConstruction parameter is straightforward. As it was suggested in [26] it has to be large enough to produce K-ANNS recall close to unity during the construction process (0.95 is enough for the most use-cases). And just like in [26], this parameter can possibly be auto-configured by using sample data.
The construction process can be easily and efficiently parallelized with only few synchronization points (as demonstrated in Fig. 9) and no measurable effect on index quality. Construction speed/index quality tradeoff is controlled via the efConstruction parameter. The tradeoff between the search time and the index construction time is presented in Fig. 10 for a 10m SIFT dataset and shows that a reasonable quality index can be constructed for
Construction time for Hierarchical NSW on 10m SIFT dataset for different numbers of threads on two CPU systems.
Plots of the query time versus construction time tradeoff for Hierarchical NSW on 10M SIFT dataset.
4.2 Complexity Analysis
4.2.1 Search Complexity
The complexity scaling of a single search can be strictly analyzed under the assumption that we build the exact Delaunay graphs instead of the approximate ones. Suppose we have found the closest element on some layer (this is guaranteed by having the Delaunay graph) and then descended to the next layer. One can show that the average number of steps before we find the closest element in the layer is bounded by a constant.
Indeed, the element levels are drawn randomly, so when traversing the graph there is a fixed probability
If we assume that in the limit of the infinite dataset size
And since the expectation of the maximum layer index by the construction scales as
The initial assumption of having the exact Delaunay graph violates in Hierarchical NSW due to the usage of approximate edge selection heuristic. Thus, to avoid stucking into a local minimum the greedy search algorithm employs a backtracking procedure on the zero layer. Simulations show that, at least for low dimensional data (Fig. 11,
Plots of the
The empirical investigation of the Delaunay graph approximation resilience (presented above) requires having the average number of Delaunay graph edges independent of the dataset size in order to evidence how well the edges are approximated with a constant number of connections in Hierarchical NSW. However, the average degree of Delaunay graph scales exponentially with the dimensionality [39], thus for high dimensional data (e.g.
4.2.2 Construction Complexity
The construction is done by iterative insertions of all elements, while the insertion of an element is merely a sequence of K-ANN-searches at different layers with subsequent use of the heuristic (which has fixed complexity at fixed efConstruction). The expected value of the number of layers for an element to be added does not depend on the size of the dataset:
Thus, the insertion complexity scaling with respect to
4.2.3 Memory Cost
The memory consumption of the Hierarchical NSW is mostly defined by the storage of the graph connections. The number of connections per element is
Сomparison to State-of-the-Art
The Hierarchical NSW algorithm was implemented in
Comparing the performance of K-ANNS algorithms is a non-trivial task since the state-of-the-art is constantly changing as new algorithms and implementations are emerging. In this work, we concentrated on comparison with the best algorithms in Euclid spaces that have open source implementations. An implementation of the Hierarchical NSW algorithm is also distributed as a part of the open source nmslib library1 together with an external
The comparison section consists of four parts: comparison to the baseline NSW (Section 5.1), comparison to the state-of-the-art algorithms in Euclid spaces (Section 5.2), rerun of the subset of tests [34] in general metric spaces in which NSW failed (Section 5.3) and comparison to state-of-the-art PQ-algorithms on a large 200M SIFT dataset (Section 5.4).
5.1 Comparison with Baseline NSW
For the baseline NSW algorithm implementation, we used the “sw-graph” from nmslib 1.1 (which is slightly updated compared to the implementation tested in [33], [34]) to demonstrate the improvements in speed and algorithmic complexity (measured in the number of distance computations).
Fig. 12(a) presents a comparison of Hierarchical NSW to NSW for
Comparison between NSW and Hierarchical NSW: (a) distance calculation number versus accuracy tradeoff for a 10 million 4-dimensional random vectors dataset; (b-c) performance scaling in terms of number of distance calculations (b) and raw query (c) time on a 8-dimensional random vectors dataset.
The scalings of the algorithms on a
5.2 Comparison in Euclid Spaces
The main part of the comparison was carried out on vector datasets with use of the popular K-ANNS benchmark ann-benchmark3 as a testbed. The testing system utilizes python bindings of the algorithms – it consequentially runs the K-ANN search for one thousand queries (extracted via random split of the initial dataset) with preset algorithm parameters producing an output containing the recall and average time of a single search. The considered algorithms are:
Baseline NSW algorithm from nmslib 1.1 (“sw-graph”).
FLANN 1.8.4 [6]. A popular library4 containing several algorithms, built-in in OpenCV5. We used the available auto-tuning procedure with several reruns to infer the best parameters.
Annoy6, 02.02.2016 build. A popular algorithm based on random projection tree forest.
VP-tree. A general metric space algorithm with metric pruning [50] implemented as a part of nmslib 1.1.
FALCONN7, version 1.2. A new efficient LSH algorithm for cosine similarity data [51].
The comparison was done on a 4X Xeon E5-4650 v2 Debian OS system with 128 Gb of RAM. For every algorithm, we carefully chose the best results at every recall range to evaluate the best possible performance (with initial values from the testbed defaults). All tests were done in a single thread regime. Hierarchical NSW was compiled using the GCC 5.3 with -Ofast optimization flag.
The parameters and description of the used datasets are outlined in Table 1. For all of the datasets except GloVe we used the L2 distance. For GloVe we used the cosine similarity (equivalent to L2 after vector normalization). The brute-force (BF) time is measured by the nmslib library.
Results for the vector data are presented in Fig. 13. For SIFT, GloVE, DEEP and CoPhIR datasets Hierarchical NSW clearly outperforms the rivals by a large margin. For low dimensional data
Results of the comparison of Hierarchical NSW with open source implementations of K-ANNS algorithms on five datasets for 10-NN searches. The time of a brute-force search is denoted as the BF.
5.3 Comparison in General Spaces
A recent comparison of algorithms [34] in general spaces (i.e., non-symmetric or with violation of triangle inequality) showed that the baseline NSW algorithm has severe problems on low dimensional datasets. To test the performance of the Hierarchical NSW algorithm we have repeated a subset of tests from [34] on which NSW performed poorly or suboptimal. For that purpose, we used a built-in nmslib testing system which had scripts to run tests from [34]. The evaluated algorithms included the VP-tree, permutation techniques (NAPP and brute-force filtering) [49], [55], [56], [57], the basic NSW algorithm and NNDescent-produced proximity graphs [29] (both in pair with the NSW graph search algorithm). As in the original tests, for each dataset the test includes the results of either NSW or NNDescent, depending on which structure performed better. No custom distance functions or special memory management were used in this case for Hierarchical NSW leading to some performance loss.
The datasets are summarized in Table 2. Further details of the datasets, spaces and algorithm parameter selection can be found in the original work [34]. The brute-force (BF) time is measured using the nmslib library.
The results are presented in Fig. 14. Hierarchical NSW significantly improves the performance of NSW and is the leader for any of the tested datasets. The strongest enhancement over NSW, almost by 3 orders of magnitude is observed for the dataset with the lowest dimensionality, the wiki-8 with JS-divergence. This is an important result that demonstrates the robustness of Hierarchical NSW, as for the original NSW this dataset was a stumbling block. Note that for the wiki-8 to nullify the effect of implementation results are presented for the distance computations number instead of the CPU time.
Results of the comparison of Hierarchical NSW with general space K-ANNS algorithms from the Non-Metric Space Library on five datasets for 10-NN searches. The time of a brute-force search is denoted as the BF.
5.4 Comparison with Product Quantization Based Algorithms
Product quantization K-ANNS algorithms [10], [11], [12], [13], [14], [15], [16], [17] are considered as the state-of-the-art on billion scale datasets since they can efficiently compress stored data, allowing modest RAM usage while achieving millisecond search times on modern CPUs.
To compare the performance of Hierarchical NSW against PQ algorithms we used the facebook Faiss library8 as the baseline (a new library with state-of-the-art PQ algorithms [12], [15] implementations, released after the current manuscript was submitted) compiled with the OpenBLAS backend. The tests were done for a 200M subset of the 1B SIFT dataset [13] on a 4X Xeon E5-4650 v2 server with 128Gb of RAM. The ann-benchmark testbed was not feasible for these experiments because of its reliance on the 32-bit floating point format (requiring more than 100 Gb just to store the data). To get the results for Faiss PQ algorithms we have utilized built-in scripts with the parameters from Faiss wiki9. For the Hierarchical NSW algorithm, we used a special build outside of the nmslib with a small memory footprint, simple non-vectorized integer distance functions and support for incremental index construction10.
The results are presented in Fig. 15 with the summarization of the parameters in Table 3. The peak memory consumption was measured by using Linux “time –v” tool in separate test runs after index construction for both of the algorithms. Even though Hierarchical NSW requires significantly more RAM, it can achieve much higher accuracy while offering a massive advance in search speed and much faster index construction.
Results of comparison with Faiss library on the 200M SIFT dataset from [13]. The inset shows the scaling of the query time versus the dataset size for Hierarchical NSW.
The inset in Fig. 15 presents the scaling of the query time versus the dataset size for Hierarchical NSW. Note that the scaling deviates from pure logarithmic, possibly due to the relatively high dimensionality of the dataset.
Discussion
By using structure decomposition of navigable small world graphs together with the smart neighbor selection heuristic the proposed Hierarchical NSW approach overcomes several important problems of the basic NSW structure advancing the state-of-the-art in K-ANN search. Hierarchical NSW offers excellent performance and is a clear leader on a large variety of the datasets, surpassing the opensource rivals by a large margin in case of high dimensional data. Even for the datasets where the previous algorithm (NSW) has lost by orders of magnitude, Hierarchical NSW was able to come first. Hierarchical NSW supports continuous incremental indexing and can also be used as an efficient method for getting approximations of the k-NN and relative neighborhood graphs, which are byproducts of the index construction.
Robustness of the approach is a strong feature which makes it very attractive for practical applications. The algorithm is applicable in generalized metric spaces performing the best on any of the datasets tested in this paper and thus eliminating the need for complicated selection of the best algorithm for a specific problem. We stress the importance of the algorithm's robustness since the data may have a complex structure with different effective dimensionality across the scales. For instance, a dataset can consist of points lying on a curve that randomly fills a high dimensional cube, thus being high dimensional at large scale and low dimensional at small scale. In order to perform the efficient search in such datasets, an approximate nearest neighbor algorithm has to work well for both cases of high and low dimensionality.
There are several ways to further increase the efficiency and applicability of the Hierarchical NSW approach. There is still one meaningful parameter left which strongly affects the construction of the index – the number of added connections per layer
One of the apparent shortcomings of the proposed approach compared to the basic NSW is the loss of the possibility of distributed search. The search in the Hierarchical NSW structure always starts from the top layer, thus the structure cannot be made distributed by using the same techniques as described in [26] due to congestion of the higher layer elements. Simple workarounds can be used to distribute the structure, such as partitioning the data across cluster nodes studied in [6], however, in this case, the total parallel throughput of the system does not scale well with the number of computer nodes.
Still, there are other possible known ways to make this particular structure distributed. Hierarchical NSW is ideologically very similar to the well-known one-dimensional exact search probabilistic skip list structure and thus can use the same techniques to make the structure distributed [45]. Potentially this can lead to even better distributed performance compared to the base NSW due to logarithmic scalability and ideally uniform load on the nodes.
ACKNOWLEDGMENTS
We thank Leonid Boytsov for many helpful discussions, assistance with Non-Metric Space Library integration and comments on the manuscript. We thank Seth Hoffert and Azat Davletshin for the suggestions on the manuscript and the algorithm and fellows who contributed to the algorithm on the github repository. We also thank Valery Kalyagin for support of this work. The reported study was funded by RFBR, according to the research project No. 16-31-60104 mol_a_dk. The work has been done while Yury A. Malkov was with the Federal state budgetary institution of science Institute of Applied Physics of the Russian Academy of Sciences, 46 Ul'yanov Street, 603950 Nizhny Novgorod, Russia. Dmitry A. Yashunin has done this work as an independent researcher.