similarity

Contents

similarity#

graph_tool.topology.similarity(g1, g2, eweight1=None, eweight2=None, label1=None, label2=None, norm=True, p=1.0, distance=False, asymmetric=False)[source]#

Return the Jaccard adjacency similarity between two graphs.

Parameters:
g1Graph

First graph to be compared.

g2Graph

Second graph to be compared.

eweight1EdgePropertyMap (optional, default: None)

Edge weights for the first graph to be used in comparison.

eweight2EdgePropertyMap (optional, default: None)

Edge weights for the second graph to be used in comparison.

label1VertexPropertyMap (optional, default: None)

Vertex labels for the first graph to be used in comparison. If not supplied, the vertex indices are used.

label2VertexPropertyMap (optional, default: None)

Vertex labels for the second graph to be used in comparison. If not supplied, the vertex indices are used.

normbool (optional, default: True)

If True, the returned value is normalized by the total number of edges.

pfloat (optional, default: 1.)

Exponent of the \(L^p\) distance function.

distancebool (optional, default: False)

If True, the complementary value is returned, i.e. the distance between the two graphs.

asymmetricbool (optional, default: False)

If True, the asymmetric similarity of g1 to g2 will be computed.

Returns:
similarityfloat

Adjacency similarity value.

Notes

In its default parametrization, the Jaccard adjacency similarity is the sum of equal non-zero entries in the adjacency matrices of both graphs, given a vertex ordering determined by the vertex labels. In other words, it counts the number of edges which have the same source and target labels in both graphs.

If norm == True (the default) the value returned is the total fraction of edges of both networks that match.

This function also allows for generalized similarities according to an \(L^p\) norm, for arbitrary exponent \(p\).

More specifically, the (unormalized) adjacency similarity is defined as:

\[S(\boldsymbol A_1, \boldsymbol A_2) = E - d(\boldsymbol A_1, \boldsymbol A_2)\]

where

\[d(\boldsymbol A_1, \boldsymbol A_2) = \left(\sum_{i\le j} \left|A_{ij}^{(1)} - A_{ij}^{(2)}\right|^p\right)^{1/p}\]

is the corresponding distance between graphs, and \(E=(\sum_{i\le j}|A_{ij}^{(1)}|^p + |A_{ij}^{(2)}|^p)^{1/p}\). Unless otherwise stated via the parameter p, the exponent used is \(p=1\). This definition holds for undirected graphs, otherwise the sums go over all directed pairs. If edge weights are provided, the weighted adjacency matrix is used.

If a multigraph is passed, the matrix entries \(A^{(1)}_{ij}\) and \(A^{(2)}_{ij}\) correspond to the edge multiplicities between nodes \(i\) and \(j\) in each graph.

If norm == True the value returned is \(S(\boldsymbol A_1, \boldsymbol A_2) / E\), which lies in the interval \([0,1]\).

If asymmetric == True, the above is changed so that the comparison is made only for entries in \(\boldsymbol A_1\) that are larger than in \(\boldsymbol A_2\), i.e.

\[d(\boldsymbol A_1, \boldsymbol A_2) = \left(\sum_{i\le j} \left|A_{ij}^{(1)} - A_{ij}^{(2)}\right|^p H(A_{ij}^{(1)} - A_{ij}^{(2)})\right)^{1/p},\]

where \(H(x) = \{1 \text{ if } x > 0; 0 \text{ otherwise}\}\) is the unit step function, and the total sum is changed accordingly to \(E=\left(\sum_{i\le j}|A_{ij}^{(1)}|^p\right)^{1/p}\).

Relation to set operations

If the graph is unweighted, \(p=1\), and norm == False the algorithm is equivalent to the following set comparisons:

  1. If distance == True the returned value is equal to

    \[2 |\boldsymbol A_1 \cup \boldsymbol A_2|\]

    where \(\boldsymbol A_1 \cup \boldsymbol A_2\) is the union of the edges in \(\boldsymbol A_1\) and \(\boldsymbol A_2\).

  2. If distance == False the returned value is equal to

    \[2 |\boldsymbol A_1 \cap \boldsymbol A_2|\]

    where \(\boldsymbol A_1 \cap \boldsymbol A_2\) is the intersection of the edges in \(\boldsymbol A_1\) and \(\boldsymbol A_2\).

  3. If distance == True and asymmetric == True the returned value is equal to

    \[|\boldsymbol A_1 \setminus \boldsymbol A_2|\]

    where \(\boldsymbol A_1 \setminus \boldsymbol A_2\) is the set of edges in \(\boldsymbol A_1\) that are not also in \(\boldsymbol A_2\).

The algorithm runs with complexity \(O(E_1 + V_1 + E_2 + V_2)\).

Parallel implementation.

If enabled during compilation, this algorithm will run in parallel using OpenMP. See the parallel algorithms section for information about how to control several aspects of parallelization.

(The above is applicable only if the vertex labels are integers bounded by the sizes of the graphs.)

Examples

>>> g = gt.random_graph(100, lambda: (3,3))
>>> u = g.copy()
>>> gt.similarity(u, g)
1.0
>>> gt.random_rewire(u)
190
>>> gt.similarity(u, g)
0.03