Abstract
Network homophily, the tendency of similar nodes to be connected, and transitivity, the tendency of two nodes to be connected if they share a common neighbor, are conflated properties in network analysis since one mechanism can drive the other. Here, we present a generative model and corresponding inference procedure that are capable of distinguishing between both mechanisms. Our approach is based on a variation of the stochastic block model (SBM) with the addition of triadic closure edges, and its inference can identify the most plausible mechanism responsible for the existence of every edge in the network, in addition to the underlying community structure itself. We show how the method can evade the detection of spurious communities caused solely by the formation of triangles in the network and how it can improve the performance of edge prediction when compared to the pure version of the SBM without triadic closure.
3 More- Received 11 May 2021
- Revised 13 September 2021
- Accepted 26 October 2021
DOI:https://doi.org/10.1103/PhysRevX.12.011004 Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI. Published by the American Physical Society
Physics Subject Headings (PhySH)
Popular Summary
The network of social connections between friends, the interactions between proteins, metabolic relationships in the cell, links between websites, and many other systems are almost always the result of a mixture of generative mechanisms. These mechanisms often operate at distinct scales—globally or locally—but nevertheless leave traces in the network structure that are difficult to distinguish from one other. Here, we provide a way to distinguish two key generative mechanisms based only on a final snapshot of the system.
In our study, we explore two network processes: homophily (the tendency of two nodes to connect if they share some underlying property) and triadic closure (the tendency of two nodes to connect if they already share a neighbor). Although distinct, these two processes lead to similar observed patterns in the network.
For each link in a network, our method can reveal whether it was more likely the result of triadic closure or homophily. From this, we can decide if dense “communities” in the network are more likely the result of one or the other. Likewise, we can tell if the presence of “triangles” (groups of three nodes all connected to each other) is a direct result of triadic closure or homophily. This has important implications for the interpretation of network data and also for the prediction of missing or unobserved edges in the networks.
Our methodology paves the way for a general, principled, and effective approach to disentangling local, global, and mesoscopic mechanisms of network formation.