• Institution: Staats- und Universitaetsbibliothek Bremen
  • http://www.keystonesymposia.org/2014meetings#utm_source=PNAS&utm_medium=banner&utm_campaign=2014PNAS
  • Science Sessions: The PNAS Podcast Program

Assessing the relevance of node features for network structure

  1. Ginestra Bianconia,
  2. Paolo Pinb,c,1 and
  3. Matteo Marsilia
  1. Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved April 21, 2009 (received for review December 1, 2008)

Abstract

Networks describe a variety of interacting complex systems in social science, biology, and information technology. Usually the nodes of real networks are identified not only by their connections but also by some other characteristics. Examples of characteristics of nodes can be age, gender, or nationality of a person in a social network, the abundance of proteins in the cell taking part in protein-interaction networks, or the geographical position of airports that are connected by directed flights. Integrating the information on the connections of each node with the information about its characteristics is crucial to discriminating between the essential and negligible characteristics of nodes for the structure of the network. In this paper we propose a general indicator Θ, based on entropy measures, to quantify the dependence of a network's structure on a given set of features. We apply this method to social networks of friendships in U.S. schools, to the protein-interaction network of Saccharomyces cerevisiae and to the U.S. airport network, showing that the proposed measure provides information that complements other known measures.

Footnotes

  • 1To whom correspondence should be addressed. E-mail: pin3@unisi.it
  • Author contributions: G.B., P.P., and M.M. designed research; G.B., P.P., and M.M. performed research; G.B., P.P., and M.M. analyzed data; and G.B., P.P., and M.M. wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • * A community structure, in general terms, is an assignment of nodes into classes. Community detection aims at partitioning nodes into homogeneous classes, according to similarity or proximity considerations.

  • * To be precise, here k i = Σjg i,j is the degree and A(q,q′) = Σi,jg i,jδqi,qδqj,q is the number of links between nodes with attribute q and q′.

  • In other words, Formula is the Gibbs–Boltzmann entropy of the ensemble of graphs which assigns equal weight to each graph g satisfying the constraints, which is equivalent to the usual Shannon entropy of the distribution of graphs in this ensemble.

  • A more precise estimate of the probability of occurrence of a given value of Θ would entail the study of large deviation properties of the entropy distribution. This goes beyond our present purposes.

  • § This limitation is imposed by the fact that the saddle point method we use to evaluate the entropy is reliable only if the number of imposed constraints N + Q 2 is of the same order of magnitude of N.

  • A plausibility argument for the scaling behavior is the following: Consider a particular permutation π and imagine making a small number nN of further perturbations by exchanging assignments on pairs of randomly chosen nodes. Each such perturbation is likely to affect a different part of the network, which means that the associated changes in the entropy can be considered as uncorrelated. Hence, we expect a change in the entropy density of the order of Formula. This is expected to hold true also for n/N finite but small suggesting that, as N increases, the difference between the entropies of 2 random permutations—and hence the denominator in Eq. 3—is of order Formula.

  • This article contains supporting information online at www.pnas.org/cgi/content/full/0811511106/DCSupplemental.

  • Freely available online through the PNAS open access option.

HighWire Press-hosted articles citing this article