generate_knn

Contents

generate_knn#

graph_tool.generation.generate_knn(points, k, dist=None, pairs=False, exact=False, r=0.5, max_rk=None, epsilon=0.001, c_stop=False, max_iter=0, directed=False, verbose=False)[source]#

Generate a graph of k-nearest neighbors (or pairs) from a set of multidimensional points.

Parameters:
pointsiterable of lists (or numpy.ndarray) of dimension \(N\times D\) or int

Points of dimension \(D\) to be considered. If the parameter dist is passed, this should be just an int containing the number of points.

kint

Number of nearest neighbors per node (or number of pairs if pairs is True).

distfunction (optional, default: None)

If given, this should be a function that returns the distance between two points. The arguments of this function should just be two integers, corresponding to the vertex index. In this case the value of points should just be the total number of points. If dist is None, then the L2-norm (Euclidean distance) is used.

pairsbool (optional, default: False)

If True, the k closest pairs of nodes will be returned, otherwise the k nearest neighbors for every edge is returned.

exactbool (optional, default: False)

If False, an fast approximation will be used, otherwise an exact but slow algorithm will be used.

rfloat (optional, default: .5)

If exact is False, this specifies the fraction of randomly chosen neighbors that are used for the search.

max_rkint (optional, default: None)

If provided and exact is False, this specifies the maximum number of randomly chosen out neighbors to consider during each iteration. A value of None implies that all out neighbors are considered.

epsilonfloat (optional, default: .001)

If exact is False and c_stop is False, this determines the convergence criterion used by the algorithm. When the fraction of updated neighbors drops below this value, the algorithm stops.

c_stopbool (optional, default: False)

If True, an alternative stopping criterion will be used: The iteration ends when the global clustering coefficient of the undirected KNN graph stopped increasing. In this case, the paramter epsilon is ignored.

max_iterint (optional, default: 0)

If exact is False, this determines the maximum number of iterations allowed. A value of 0 means that no limit is imposed.

directedbool (optional, default: False)

If True a directed version of the graph will be returned, otherwise the graph is undirected.

Returns:
gGraph

The k-nearest neighbors graph.

wEdgePropertyMap

Edge property map with the computed distances.

Notes

The approximate version of this algorithm is based on [dong-efficient-2011], and has a (conjectured) run-time of \(O(k^2N\log N)\), where \(N\) is the number of points. The exact version has a complexity of \(O(N^2)\).

If pairs is True, the \(k\) closest pairs are found from the nearest neighbors problem as described in [lenhof-k-closest], which has a complexity upper bounded by

Parallel implementation.

If enabled during compilation, this algorithm will run in parallel using OpenMP. See the parallel algorithms section for information about how to control several aspects of parallelization.

References

[dong-efficient-2011]

Wei Dong, Charikar Moses, and Kai Li, “Efficient k-nearest neighbor graph construction for generic similarity measures”, In Proceedings of the 20th international conference on World wide web (WWW ‘11). Association for Computing Machinery, New York, NY, USA, 577–586, (2011) DOI: https://doi.org/10.1145/1963405.1963487 [sci-hub, @tor]

[lenhof-k-closest]

HP Lenhof, M Smid, “The k closest pairs problem”, https://people.scs.carleton.ca/~michiel/k-closestnote.pdf

Examples

>>> points = np.random.random((1000, 10))
>>> g, w = gt.generate_knn(points, k=5)