- Access by Staats- und Universitaetsbibliothek Bremen
Statistical mechanics of multiedge networks
Phys. Rev. E 88, 062806 – Published 5 December, 2013
DOI: https://doi.org/10.1103/PhysRevE.88.062806
Abstract
Statistical properties of binary complex networks are well understood and recently many attempts have been made to extend this knowledge to weighted ones. There are, however, subtle yet important considerations to be made regarding the nature of the weights used in this generalization. Weights can be either continuous or discrete magnitudes, and in the latter case, they can additionally have undistinguishable or distinguishable nature. This fact has not been addressed in the literature insofar and has deep implications on the network statistics. In this work we face this problem introducing multiedge networks as graphs where multiple (distinguishable) connections between nodes are considered. We develop a statistical mechanics framework where it is possible to get information about the most relevant observables given a large spectrum of linear and nonlinear constraints including those depending both on the number of multiedges per link and their binary projection. The latter case is particularly interesting as we show that binary projections can be understood from multiedge processes. The implications of these results are important as many real-agent-based problems mapped onto graphs require this treatment for a proper characterization of their collective behavior.
Article Text
The increasing and unprecedented quality and quantity of available data coming from very different areas is boosting the field of complex networks. Interdisciplinary science demands new efforts and new tools, addressed not only to develop more efficient computational strategies to analyze incoming data but also to span a theoretical framework where both more accurate and more tractable models can provide predictions closer to the reality one wants to face. In this context, a standard approach consists of representing in a graph the complex structure of interactions among the elements of a given system. Statistical mechanics is an extraordinary framework where such a complex structure can be appropriately modeled and with this aim a large amount of studies has appeared in recent years .
The simplest representation of a network assumes the existence of nodes and edges. The edges do not need to be necessarily symmetric and they can provide information about the relative influence (interaction) of a node onto another, i.e., they can be directed and have a certain intensity. However, the early studies in the field were essentially focused on a binary projection of the network on a graph where only the existence of an edge and its distribution were required to determine some of its properties, without considering the effect of weights. In this way one could compute the probability for two arbitrary sites (nodes) to be connected through an edge. It is well known that such probability keeps certain analogies with occupation numbers in quantum statistics, in particular, with fermionic systems . Further developments have extended these results to richer and more complex structures such as directed and weighted graphs finding analogy with bosonic systems .
A successful and complete description of any system amenable to be represented as a complex network through statistical mechanics requires an appropriate characterization regarding the detailed features of the microscopic components forming the network. Up to now, the actions represented by an edge have always been considered as indistinguishable events, but in some cases, the data represented by edges and their weights can be generated by independent, identifiable actions. This circumstance needs to be taken into account whenever using any statistical technique applied to complex networks. Its impact being important as can be seen in analogy to statistical mechanics where the distinguishability of particles leads to different descriptions in terms of Fermi-Dirac, Bose-Einstein or Maxwell-Boltzmann statistics.
Processes generated by single agents represent single events, for instance, a trip between two locations, a call between two cities or communications at a given time. A particularly clear example where this distinguishability of weight units is crucial are transportation networks , where human flows between different locations are used to build the so-called Origin Destination matrices which collect global information about their mobility or allocation. A naive approach based on a standard weighted description of a network is not satisfactory for this problem as it was already pointed out by Wilson who mapped transport systems to statistical physics using an entropy maximization approach. Hence, modern complex networks theory calls for the introduction of new general models taking into account the possible distinguishability of the edges forming the considered network.
The present work addresses this issue by presenting a general statistical mechanics approach to networks created from distinguishable single-unit events that can be grouped in edges, in analogy to the allocation of particles in discrete energy levels of classical statistical mechanics systems. In such networks, the edge weights encode information about independent processes performed by different agents at (possibly) different times, and hence those are necessarily represented by integer numbers and constitute quantized, distinguishable entities. The difference between this representation, which leads to what we call multiedge networks, and the already considered weighted ones is clear as schematically shown in Fig. . Multiedge networks assume the existence of a minimal weight of unit value (and hence a quantization) representing an indissociable event. In some cases, groups of these events connect the same pair of nodes, hence forming an edge with multiple connections, which has a different nature from a weighted one where no quantization is imposed. If one aims to apply the well-known ensemble theory used in statistical physics to networks , one needs to take into account this subtle yet important difference, which has profound implications on the associated statistics. The choice of one or the other representation will thus depend on the problem at hand and makes a big difference in terms of collective behavior of the whole network as will be shown along this paper. This study fills the existing gap in the field by introducing a complete and necessary discussion on distinguishability that is to be added to the already existing models and thus complements the complex networks body of theory.
The distinguishability issue: Out of a set of three distinguishable nodes and three distinguishable (directed) events, one can see that the number of configurations (described in terms of edge names) compatible with a given occupation number sequence is degenerate, since all permutations of events generate different microstates.
Many examples of systems studied using network theory can be understood as multiedge networks. All those networks composed by the actions of independent agents (mobility networks , call networks ) and in general those whose edges are formed by the sum of independent actions developed during a certain time fall within this category . The present work deals with the challenge of introducing null models that provide exact analytical expectations for the main network observables under some given fixed constraints. This, in turn, allows one to assess whether any observed network characteristic from real data is due to the imposed constraints or to the contrary, to some unexpected phenomena worth exploring.
The paper is structured as follows: Sec. introduces the concept of multiedge networks together with some basic nomenclature and definitions used. Sections introduce the general derivation of the considered ensembles using different sets of constraints (mainly linear constraints on occupation numbers, binary-projection constraints, and both). In each section, some important examples are explicitly derived and formulas for the state statistics and the ensemble entropies are obtained. Finally some concluding remarks are given discussing the work done. The Appendices contain details, additional discussion, and some mathematical developments on mentioned examples in the text.
A multiedge network is a collection of nodes that may be connected by none, one or more than one element (event) taken from a set of independent, distinguishable entities. Such an object admits a coarse-grained representation in terms of a multiedge matrix T. The matrix entries
Such a function ensures that
Our goal is to construct an ensemble framework that allows one to treat multiedge networks with any given set of constraints
where
and then maximizing its associated entropy,
where the sum runs over all possible configurations of events (microstates) on the accessible phase space
In the remainder of the paper, we shall work on the occupation-number space
where we changed the notation
A final comment deserves the seminal work by Wilson on transport theory . He followed a similar scheme and found expressions for the expected number of events connecting two arbitrary nodes according to some considered constraints. However, in the present paper we go further; we present a modern methodology able to handle more complicated constraints as compared to those analyzed by Wilson and also consider constraints which affect simultaneously the distribution of occupation numbers and its binary projection on the graph, for instance, those affecting the strength (weighted degree ) of a node and its degree.
In this work we proceed following a microcanonical scenario. In this ensemble, all the configurations are equally probable and the constraints are considered “hard,” i.e., the phase space accessible always fulfils strictly the constraints. Thus, we must only maximize the expression
We follow the same methodology developed in seminal works by Bianconi and write down the volume of the
where
If we wish to perform the sum over occupation numbers (
For this ensemble and for linear constraints on occupation numbers, one can write
being
The occupation number statistics can be shown to have multinomial nature (see Appendix ),
This fact assures that the relative fluctuations of occupation numbers vanish in the thermodynamic limit (
We identified
Concerning the entropy of the graphs in this ensemble, the integral in can be approximated to first order by using the steepest descent methods,
where
Merging one obtains the event-specific entropy of a given graph in this ensemble:
which has a final Shannon entropy form, equivalent to the Boltzmann-Gibbs entropy.
If one is able to exactly solve the saddle point equations obtained from the steepest descent approximation, the full distribution of occupation numbers is recovered. Despite being in a microcanonical framework, one could consider a canonical ensemble where the constraints are fulfilled on average, hence
where we have used the properties of the multinomial distribution presented in . Although the theoretical basis for the generation of graphs in different ensembles is introduced in this paper (see Appendix ), the challenges for the exact and efficient generation of such ensembles will be shortly tackled and presented in future work.
In the following, we shall consider some explicit cases of linear constraints on This is the simplest case where we have a single hard constraint (apart from the number of nodes In this case it is straightforward to determine the average value of the occupation numbers (in this case Therefore, all the occupation numbers have constant probability Other typical network magnitudes of interest such as the strengths (both incoming Concerning the Boltzmann-Gibbs entropy of the ensemble, Which recovers a Shannon form over events as expected, If additionally to the number of events which leads to the saddle point equation, and finally, And this leads to a weighted version of the Waxman graph . This reasoning can be extended to study the interesting case where the distribution of costs is also fixed, which is of particular interest in the field of O-D matrices used to analyze mobility, where the mobility of users using certain types of transports is assumed to follow particular statistical forms (). This case is analyzed in detail and solved in Appendix . For any of the cases involving cost matrices, especially those related with distances, it is very important to remark that the allocation of occupation numbers is not independent in each state and hence it is not true that the probability We consider now the case in which the only given constraint is the strength sequence and Eq. becomes Now we need to solve the And we apply again to get where we have identified For large and we recover the weighted configuration model , This result recovers the expression in and is in accordance with a maximum likelihood principle (contrary to the case of weighted networks as explained in ). Let us notice that these results allow a straightforward extension to the canonical ensemble. By identifying from , one can work with a strength sequence which is no longer fixed but a collection of fluctuating integer random values with a multinomial structure, since the partial grouping of multinomial random variables Concerning the entropy in this canonical ensemble scenario, making use of we further obtain a closed expression in terms of the constraints of the problem, Let us remind that in this context Further additional constraints can be added leading to different models. An example would be to merge the last two examples of a network living in a metric space (or any network where we can give a cost to the edges) with fixed “accessibility” of each node (distance-weighted strength). We could also fix the average cost per trip The case of the gravity law models deserves a closer look since an entropy maximization approach yields a form
IV. MULTIEDGE NETWORK WITH GIVEN NONLINEAR CONSTRAINTS ON THE BINARY PROJECTION OF THE OCCUPATION NUMBERS 𝛩 ( 𝑡 𝑖 𝑗 )
Let us consider now more complex situations such as the case of nonlinear constraints on
The main technical difficulty in dealing with these types of constraints is that they do not allow the summation in a multinomial form of the terms in
Proceeding as explained we obtain a new version of Eq. :
where we have introduced auxiliary fields
From expression one can compute explicitly those values yielding
For ease in notation, we perform the change
which in turn, regarding only the binary projection, corresponds to the outcome of independent Bernoulli processes with probabilities
From one sees that in this case the multiedge structure is completely determined by the binary constrained topology. Our coarse-grained description in terms of independent occupation numbers implies that for each state, two outcomes can be considered: Either the edge does not exist (obviously with 0 occupation) or it does exist, in which case the resulting (conditioned) statistics being Poisson with mean value
The constant relation of proportionality
which can be inverted leading to
where
With and we can compute the relative fluctuations of both occupation numbers and binary links in the thermodynamic limit,
These expressions reflect the bimodal structure of the state statistics and explains the nonvanishing nature of the relative fluctuations: The variance of the occupation numbers has a maximum for
Concerning the entropy, performing the steepest descent approximation on to first order as in we have
Using
This expression has two clear contributions: The first term
The second term in can be explicitly evaluated in two limiting cases: The dense case which corresponds to the thermodynamic limit and the sparse for which the binary and weighted structure are equal
Considering the sparse case, one has from
which corresponds to the microcanonical counting of configurations coming from valid permutations of distinguishable multiedges over the fixed binary structure of
The dense case corresponds to
which has a Shannon form if we consider
In the following, we present some examples to clarify the usage of this new methodology. We start by considering the most simple case where we only fix the total number of events In this case we introduce two Lagrange multipliers ( where we identify We recover the binary structure of the well-known Erdös-Renyi graph as a result of the binary projection of a nontrivial multiedge structure, which on average values fulfills Despite the average occupation numbers over the ensemble being equal to the cases in Sec. , the underlying statistic is not. All the nodes (and states) in this case are statistically equivalent and their associated strengths and degrees (incoming and outgoing) are proportional on average (since they are fluctuating quantities not being fixed by the constraints): The entropy is readily computed from yielding, Expression can also be computed using combinatorial arguments: Consider a process in which one selects and hence the microcanonical entropy is The next important case to consider is the one where the node binary connectivity of the graph is fixed. Such a situation is specially interesting as our framework permits one to understand binary networks as the projection (for instance, due to partial information or limited resolution) of a process generated by independent agents. In this case the constraints read So we introduce two sets of Lagrange multipliers ( And we solve the saddle point equations, where we have identified As expected we recover the proportionality of strengths and degrees for each node under this particular set of constraints. Considering only the binary projection of the graph, one gets where we have identified The previous expression corresponds exactly with the expression for the so-called canonical ensemble of the random graph with any given degree distribution , where for fixed
V. MULTIEDGE NETWORK WITH GIVEN LINEAR CONSTRAINTS DEPENDING ON BOTH THE OCCUPATION NUMBERS 𝑡 𝑖 𝑗 AND THEIR BINARY PROJECTION 𝛩 ( 𝑡 𝑖 𝑗 )
We analyze for the sake of completeness the most general case where both types of considered constraints are fixed, and .
We introduce two sets of Lagrangian multipliers for the multiedge constraints
Using and we obtain the statistic for the occupation numbers and their projections,
Assuming that the saddle point equations can be solved, which means that the imposed combination of binary and multiedge constraints is graphical, i.e.,
where
The relative fluctuations on occupation numbers do not vanish in the thermodynamic limit due to the strong constraints imposed by the binary structure. One can still express
In the thermodynamic limit (
Regarding the relation between expected occupation numbers and their binary projections, one has
Concerning the entropy, approximating by saddle point methods using we obtain the general expression that includes all the previous cases considered,
where
The two limiting cases early considered can again be evaluated. The sparse case implies that
which is identical to the previous one (since it is equivalent to dropping the strength constraints).
For the thermodynamic limit (dense case), we have
for which the previous expressions encountered are limiting cases. On one hand, if we relax the constraints on the occupation numbers, then
For simplicity, the only example we report in this section corresponds to the very relevant case in which both strength and degree sequences are fixed, the rest of the cases being easily derivable from the general theory exposed. We analyze here a situation where the constraints imposed are the strength and degree sequence [ and ]. The calculations are analogous to the previous section obtaining where we have identified Note that the expressions found in the previous cases are particular examples of this general problem and can be readily recovered by removing the appropriate constraints, i.e., making the Lagrange multipliers equal to zero, which in this case is equivalent to setting We can revisit the case where only the strength sequence is fixed. Now, although the resulting statistics are Poisson and not multinomial, it can be proved that in the thermodynamic limit both descriptions are equivalent (see Appendix ). Additionally, in such a case one can obtain the statistics of the binary projection of the occupation numbers Unfortunately, the explicit form of the Lagrange multipliers for the degrees or the strengths cannot be solved, since the uncorrelated approximation is no longer valid, In this last expression the factorization of the connection probability in two node-dependent magnitudes is impossible, despite the approximation assumed. Hence one sees again that there is no way of generating uncorrelated networks at the level of degrees under the strict set of constraints considered.
The present work deals with the statistical framework of multiedge networks, which in contrast to weighted networks, are entities formed by distinguishable events that can be grouped in edges weighted by integer numbers. It introduces flexible analytical null models that provide expectations for the main variables of a system represented in terms of a complex network (the entries of its adjacency matrix) thus complementing the existing studies on weighted networks. The decision upon which model to take depends on the kind of (physical) process generating the network at study.
We have started by properly defining the differences between weighted and multiedge networks based on the distinguishability or not of the elements forming a network. We have then properly set up a framework of multiedge networks in a statistical mechanics approach by defining ensembles of networks fulfilling some given sets of general constraints. Such a framework has been used to obtain analytical expressions considering cases with general constraints depending linearly on the number of events allocated to each edge as well as their binary projections, considering only the existence or not of a given edge. Some common interesting cases have been explicitly developed as examples such as fixed strength sequence, degree sequence, cost, strength sequence plus cost, total number of binary edges, and both strength and degree fixed sequences. Previous results found in the literature have been recovered, specifically the correlations of the configurational model (for binary constraints on the degree sequence) and the absence of correlation between occupation numbers and degree once the degree sequence is fixed among others. Our treatment uncovers explicit relations between the binary occupation probability of an edge and its expected occupation number, which allows one to fully characterize binary networks as projections of multiedge graph instances. These results permit an extensive treatment of the finite size effects present in this kind of network, since they are fully valid both in a dense scenario and intermediate cases.
Furthermore, we have explicitly derived general forms for the probability of obtaining a given graph with given constraints in terms of its adjacency matrix elements statistics, which can be understood as the canonical and grand-canonical description of the considered ensembles. Such expressions permit explicit entropy measures, useful for the analysis of similar systems for data inference . Additionally, the analytical character of the presented methodology allows for the explicit calculation of some network metrics using probabilistic examples such a clustering and strength-degree correlations, for instance. As a complement we have also introduced the main ideas which can lead to the efficient generation of multiedge graphs though its details are left for development in future work.
The applications of the theory developed here can be used in a wide variety of fields, mainly all those network studies applied to data generated by agent-based systems and in particular in the very active transportation research area, where networks are the result of complex, time-dependent, mobility patterns . It also opens the door to extensions of this kind of network to multiple layer scenarios .
1 Please note that the framework we present permits self-edges if one desires; it is just a matter of extending the sums over available states over all values of
This work has been partially supported by the Spanish DGICYT Grant No. FIS2009-13730-C02-01 and by the Generalitat de Catalunya 2009-SGR-00838. O.S. has been supported by the Generalitat de Catalunya through the FI Program.
In this section we present the cumulant generating functions for the different models proposed (Bernoulli, multinomial, Poisson, and zero-inflated Poisson) and show that their close relationship to the expressions developed in the main text.
The cumulant generating function of the probability distribution
Note that if we consider the joint distribution of two (or more) independent variables
Having introduced that, if we start at the microcanonical level [Eq. ] and identify
we clearly see that
and the relation between both objects is apparent.
Starting at the level where only the number of events
Despite having additional terms, with regards to differentiation with respect to
Considering now only the most general case in Eq. we have again (adding external fields for the binary projection
Dropping the constraints on the binary projection, i.e.,
where we identified
Dropping the constraints on the multilink nature of the network
where we identified
Finally, the most general case can be mapped to a mixed zero-inflated Poisson process as we shall prove: Imagine the outcome of a process in which we sort
which represents a probability measure over an integer quantity. We can compute the mean and variance of
The obtained expressions need to be compared with , which allows one to identify
which is identical to the argument in the sum of Eq. (except for a linear constant) and captures the more general case considered.
Throughout this paper we have uncovered the mathematical expressions allowing one to generate networks under different ensembles using a probabilistic framework over
-
(1) Canonical ensemble (linear constraints on
):𝑡 𝑖 𝑗 -
(2) Grandcanonical ensemble [linear constraints on
and/or on𝛩 ( 𝑡 𝑖 𝑗 ) ]:𝑡 𝑖 𝑗 -
(3) Microcanonical ensemble. This ensemble can be used by generating sequences of
using the two above expressions and discarding those not corresponding exactly with the imposed constraints.{ 𝑡 𝑖 𝑗 }
We have shown already that the relative fluctuations of the linear constraints on the occupation numbers
To do so, we make use of the properties of the multinomial distribution to recover it under the cases where no constraints or only linear constraints on
The outcome of a process in which we sort different independent Poisson variables of parameters
And since the resulting occupation numbers statistics derive to a Poisson distribution, which has vanishing fluctuations on the thermodynamic limit (
Let us assume that instead of having
and we now perform the finite sum inside the integral,
And for the occupation numbers and edge probability obtain
which converge extremely quickly to the obtained results for
We here report an additional example which may be of interest for studies on transportation origin-destination matrices, where some forms of trip-cost distribution have been discussed .
Starting from Sec. (), it is a matter of considering the additional term,
on the equations, where
where
The expressions of
And the prior considerations are also valid. Note also that keeping the multinomial framework, all the quantities considered are intensive, while the expected strength and average occupation numbers remain extensive variables. Additionally, in the thermodynamic limit these quantities have vanishing relative fluctuations.
References (40)
- R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 (2002).
- M. E. J. Newman, Networks: An Introduction (Oxford University Press, Oxford, 2010), p. 720.
- S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Networks: From Biological Nets to the Internet and WWW (Oxford University Press, Oxford, 2003).
- J. Park and M. E. J. Newman, Phys. Rev. E 70, 066117 (2004).
- D. Garlaschelli and M. Loffredo, Phys. Rev. Lett. 102, 038701 (2009).
- G. Krings, F. Calabrese, C. Ratti, and V. D. Blondel, J. Stat. Mech.: Theory Exp. (2009) L07003.
- M. Barthélemy, Phys. Rep. 499, 1 (2011).
- P. Kaluza, A. Kölzsch, M. T. Gastner, and B. Blasius, J. R. Soc. Interface R. Soc. 7, 1093 (2010).
- C. Roth, S. M. Kang, M. Batty, and M. Barthélemy, PLoS One 6, e15923 (2011).
- A. G. Wilson, Transportation Research 1, 253 (1967).
- A. Annibale, A. C. C. Coolen, L. P. Fernandes, F. Fraternali, and J. Kleinjung, J. Phys. A. Math. Gen. 42, 485001 (2009).
- S. E. Ahnert, D. Garlaschelli, T. M. A. Fink, and G. Caldarelli, Phys. Rev. E 76, 016101 (2007).
- F. Simini, M. C. González, A. Maritan, and A.-L. Barabási, Nature (London) 484, 96 (2012).
- P. Expert, T. S. Evans, V. D. Blondel, and R. Lambiotte, Proc. Natl. Acad. Sci. 108, 7663 (2011).
- P. Holme and J. Saramäki, Phys. Rep. 519, 97 (2012).
- D. Garlaschelli and M. I. Loffredo, Phys. Rev. E 78, 015101(R) (2008).
- A. Wilson, Geogr. Anal. 42, 364 (2010).
- M. Barthélemy, A. Barrat, R. Pastor-Satorras, and A. Vespignani, Physica A 346, 34 (2005).
- G. Bianconi, Europhys. Lett. 81, 28005 (2008).
- B. M. Waxman, Sel. Areas Commun. IEEE J. 6, 1617 (1988).
- A. Bazzani, B. Giorgini, S. Rambaldi, R. Gallotti, and L. Giovannini, J. Stat. Mech.: Theory Exp. (2010) P05001.
- X. Liang, X. Zheng, W. Lv, T. Zhu, and K. Xu, Physica A 391, 2135 (2012).
- M. A. Serrano and M. Boguna, AIP Conf. Proc. 776, 101 (2005).
- M. A. Serrano, M. Boguna, and R. Pastor-Satorras, Phys. Rev. E 74, 055101 (2006).
- T. Squartini and D. Garlaschelli, New J. Phys. 13, 083001 (2011).
- S. Erlander and N. F. Stewart, The Gravity Model in Transportation Analysis: Theory and Extensions (CRC Press, Boca Raton, 1990).
- S. Goh, K. Lee, J. S. Park, and M. Y. Choi, Phys. Rev. E 86, 026102 (2012).
- J. Park and M. E. J. Newman, Phys. Rev. E 70, 066117 (2004).
- D. Lambert, Technometrics 34, 1 (1992).
- R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeffrey, and D. E. Knuth, Advances in Computational Mathematics (Springer, New York, 1996), pp. 329–359.
- G. Bianconi, A. C. C. Coolen, and C. J. Perez Vicente, Phys. Rev. E 78, 016114 (2008).
- P. Erdős and A. Rényi, Publications of the Mathematical Institute of the Hungarian Academy of Sciences (Hungarian Academy of Sciences, Budapest, 1960), pp. 17–61.
- A. Barrat, M. Barthélemy, R. Pastor-Satorras, and A. Vespignani, Proc. Natl. Acad. Sci. 101, 3747 (2004).
- M. Boguñá and R. Pastor-Satorras, Phys. Rev. E 68, 036112 (2003).
- K. Anand, G. Bianconi, and S. Severini, Phys. Rev. E 83, 036109 (2011).
- Z. Burda and A. Krzywicki, Phys. Rev. E 67, 046118 (2003).
- G. Bianconi, P. Pin, and M. Marsili, Proc. Natl. Acad. Sci. 106, 11433 (2009).
- P. Colomer-de-Simon and M. Boguñá, Phys. Rev. E 86, 026120 (2012).
- G. Bianconi, Phys. Rev. E 87, 062806 (2013).
- A. Solé-Ribalta, M. De Domenico, N. E. Kouvaris, A. Díaz-Guilera, S. Gómez, and A. Arenas, Phys. Rev. E 88, 032807 (2013).