Original Article

Statistical clustering of temporal networks through a dynamic stochastic block model

Catherine Matias

Corresponding Author

E-mail address:catherine.matias@math.cnrs.fr

Centre National de la Recherche Scientifique, Université Pierre et Marie Curie and Laboratoire de Probabilités et Modèles Aléatoires, Paris, France

Address for correspondence: Catherine Matias, Université Pierre et Marie Curie, Case courvieux 188, 4 place Jussieu, 75252 Paris Cedex 05, France. E‐mail: E-mail address:catherine.matias@math.cnrs.fr
Search for more papers by this author
Vincent Miele

Université de Lyon, Université Lyon 1, Centre National de la Recherche Scientifique and Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France

Search for more papers by this author
First published: 22 August 2016
Cited by: 7
Go here for SFX

Summary

Statistical node clustering in discrete time dynamic networks is an emerging field that raises many challenges. Here, we explore statistical properties and frequentist inference in a model that combines a stochastic block model for its static part with independent Markov chains for the evolution of the nodes groups through time. We model binary data as well as weighted dynamic random graphs (with discrete or continuous edges values). Our approach, motivated by the importance of controlling for label switching issues across the different time steps, focuses on detecting groups characterized by a stable within‐group connectivity behaviour. We study identifiability of the model parameters and propose an inference procedure based on a variational expectation–maximization algorithm as well as a model selection criterion to select the number of groups. We carefully discuss our initialization strategy which plays an important role in the method and we compare our procedure with existing procedures on synthetic data sets. We also illustrate our approach on dynamic contact networks: one of encounters between high school students and two others on animal interactions. An implementation of the method is available as an R package called dynsbm.

1 Introduction

Statistical network analysis has become a major field of research, with applications as diverse as sociology, ecology, biology and the Internet. General references on statistical modelling of random graphs include Kolaczyk (2009) and Goldenberg et al. (2010) and Snijders (2011). Whereas static approaches were developed as early as in the 1960s (mostly in the field of sociology), the literature concerning dynamic models is much more recent. Modelling discrete time dynamic networks is an emerging field that raises many challenges and we refer to Holme (2015) for a most recent review.

An important part of the literature on static network analysis is dedicated to clustering methods, with both aims of taking into account the intrinsic heterogeneity of the data and summarizing these data through node classification. Among clustering approaches, community detection methods form a smaller class of methods that aim at finding groups of highly connected nodes. Our focus here is not only on community detection but also more generally on node classification based on connectivity behaviour, with a particular interest on model‐based approaches (see for example Matias and Robin (2014)). When considering a sequence of snapshots of a network at different time steps, these static clustering approaches will give rise to classifications that are difficult to compare through time and thus difficult to interpret. Important to note is that label switching between two successive time steps may not be solved without an extra assumption, e.g. that most of the nodes do not change group across two different time steps. However, to our knowledge, this kind of assumption has never been discussed in the literature. In this work, we are interested in statistical models for discrete time dynamic random graphs, with the aim of providing a node classification varying with time, while controlling for label switching issues across the different time steps. Our answer to this challenge will be to focus on the detection of groups that are characterized by stable within‐group connectivity behaviour. We believe that this is particularly suited to dynamic contact networks.

Stochastic block models (SBMs) form a widely used class of statistical (and static) random graph models that provide a clustering of the nodes. SBMs introduce latent (i.e. unobserved) random variables on the nodes of the graph, taking values in a finite set. These latent variables represent the node groups and interaction between two nodes is governed by these corresponding groups. The model includes (but is not restricted to) the specific case of community detection, where within‐group connections have higher probability than across‐group connections. Combining SBMs with a Markov structure on the latent part of the process (the nodes classification) is a natural way of ensuring a smooth evolution of the groups across time and has already been considered in the literature. Yang et al. (2011) considered undirected, either binary or finitely valued, discrete time dynamic random graphs. The static aspect of the data is handled through SBMs, whereas its dynamic aspect is as follows. For each node, its group membership forms a Markov chain, independent of the values of the other node memberships. There, only the group membership is allowed to vary across time whereas connectivity parameters among groups stay constant through time. Yang et al. (2011) proposed a method to infer these parameters (either on line or off line), based on a combination of Gibbs sampling and simulated annealing. For binary random graphs, Xu and Hero (2014) proposed to introduce a state space model through time on (the logit transform of) the probability of connection between groups. In contrast with previous work, both group membership and connectivity parameters across groups may vary through time. As such, we shall see below that this model has a strong identifiability problem. Their (on‐line) iterative estimation procedure is based on alternating two steps: a label switching method to explore the space of node group configurations, and the (extended) Kalman filter that optimizes the likelihood when the group memberships are known. Note that neither Yang et al. (2011) nor Xu and Hero (2014) proposed to infer the number of clusters. Bayesian variants of these dynamic SBMs may be found for instance in Ishiguro et al. (2010) and Herlau et al. (2013).

Surprisingly, we noted that the above‐mentioned methods were evaluated on synthetic data sets in terms of the averaged value over the time steps of a clustering quality index computed at a fixed time step. Naturally, those indices do not penalize for label switching and two classifications that are identical up to a permutation have the highest quality index value. Computing an index for each time step, the label switching issue between different time steps disappears and the classification task becomes easier. Indeed, such criteria do not control for a smoothed recovery of groups along different time points. It should also be noted that the synthetic experiments from these works were performed under the dynamic version of the binary affiliation SBM, which has non‐identifiable parameters. The affiliation SBM, also known as the planted partition model, corresponds to the case where the connectivity parameter matrix has only two different values: a diagonal value that drives within‐group connections and an off‐diagonal value for across‐group connections. In particular, the label switching issue between different time steps may not be easily removed in this particular case.

Other approaches for model‐based clustering of dynamic random graphs do not rely directly on SBMs but rather on variants of the SBM. We mention the random subgraph model that combines SBMs with the a priori knowledge of a nodes partition (inducing subgraphs), by authorizing the group proportions to differ in the different subgraphs. A dynamic version of the random subgraph model that builds on the approach of Xu and Hero (2014) appears in Zreik et al. (2016). Detection of persistent communities has been proposed in Liu et al. (2014) for directed and dynamic graphs of call counts between individuals. Here the static underlying model is a time‐ and degree‐corrected SBM with Poisson distribution on the call counts. Group memberships are independent through time instead of Markov, but smoothness in the classification is obtained by imposing that within‐group expected call volumes are constant through time. Inference is performed through a heuristic greedy search in the space of group memberships. Only real data sets and no synthetic experiments have been explored in this latter work.

Another very popular statistical method for analysing static networks is based on latent space models. Each node is associated with a point in a latent space and probability of connection is higher for nodes whose latent points are closer (Hoff et al., 2002). In Sarkar and Moore (2005) a dynamic version of the latent space model is proposed, where the latent points follow a (continuous state space) Markov chain, with transition kernel given by a Gaussian perturbation on the current position with zero mean and small variance. Latent position inference is performed in two steps: a first initial guess is obtained through multi‐dimensional scaling. Then, non‐linear optimization is used to maximize the model likelihood. The work by Xu and Zheng (2009) is very similar, adding a clustering step on the nodes. Finally, Heaukulani and Ghahramani (2013) relied on Markov chain Monte Carlo methods to perform a Bayesian inference in a more complicated set‐up where the latent positions of the nodes are not independent.

Mixed membership models (Airoldi et al., 2008) are also explored in a dynamic context. The work by Xing et al. (2010) relies on a state space model for the evolution of the parameters of the priors of both the mixed membership vector of a node and the connectivity behaviour. Inference is carried out through a variational Bayes expectation–maximization (EM) algorithm e.g. Jordan et al. (1999).

This non‐exhaustive bibliography on model‐based clustering methods for dynamic random graphs shows both the importance and the huge interest in the topic.

In the present work, we explore statistical properties and frequentist inference in a model that combines an SBM for its static part with independent Markov chains for the evolution of the node groups through time. Our approach aims to achieve both interpretability and statistical accuracy. Our set‐up is very close to those of Yang et al. (2011) and Xu and Hero (2014), the first and main difference being that we allow for both group memberships and connectivity parameters to vary through time. By focusing on groups that are characterized by a stable within‐group connectivity behaviour, we can ensure parameter identifiability and thus valid statistical inference. Indeed, whereas Yang et al. (2011) used the strong constraint of fixed connectivity parameters through time, Xu and Hero (2014) entirely relaxed this assumption at the (not acknowledged) cost of a label switching issue between time steps. Second, we model binary data as well as weighted random graphs, whether they are dense or sparse, with discrete or continuous edges. Third, we propose a model selection criterion to choose the number of clusters. To simplify the notation, we restrict our model to undirected random graphs with no self‐loops but easy generalizations would handle directed data sets and/or including self‐loops.

The paper is organized as follows. Section 2.1 describes the model and sets the notation. Section 2.2 gives intuition about the identifiability issues that are raised by authorizing both group memberships and connectivity parameters to vary freely with time. This was not pointed out by Xu and Hero (2014) although they worked in this context. The section motivates our focus on groups that are characterized by a stable within‐group connectivity behaviour. Section 2.3 then establishes our identifiability results. To our knowledge, it is the first dynamic random graph model where parameter identifiability (up to label switching) is discussed and established. Then, Section 3 describes a variational expectation–maximization (VEM) procedure for inferring the model parameters and clustering the nodes. The VEM procedure works with a fixed number of groups and an integrated classification likelihood (Biernacki et al., 2000) criterion is proposed for estimating the number of groups. We also discuss initialization of the algorithm—an important but rarely discussed step, in Section 3.2. Synthetic experiments are presented in Section 4. There, we discuss classification performances without neglecting the label switching issue that may occur between time steps. In Section 5, we illustrate our approach with the analysis of real life contact networks: a data set of encounters between high school students and two other data sets of animal interactions. We believe that our model is particularly suited to handling this type of data. We mention that the methods are implemented in an R package that is available from http://lbbe.univ-lyon1.fr/dynsbm. On‐line supplementary materials complete the paper.

2 Set‐up and notation

2.1 Model description

We consider weighted interactions between N individuals recorded through time in a set of data matrices urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0001. Here T is the number of time points and, for each value t ∈ {1,…,T}, the adjacency matrix urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0002 contains real values measuring interactions between individuals urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0003. Without loss of generality, we consider undirected random graphs without self‐loops, so that urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0004 is a symmetric matrix with no diagonal elements.

We assume that the N individuals are split into Q latent (unobserved) groups that may vary through time, as encoded by the random variables urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0005 with values in urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0006. This process is modelled as follows. Across individuals, random variables urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0007 are independent and identically distributed. Now, for each individual i ∈ {1,…,N}, the process urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0008 is an irreducible, aperiodic stationary Markov chain with transition matrix urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0009 and initial stationary distribution urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0010. When no confusion occurs, we may alternatively consider urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0011 as a value in urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0012 or as a random vector urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0013 constrained to urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0014.

Given latent groups Z, the time varying random graphs urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0015 are independent, the conditional distribution of each urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0016 depending only on urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0017. Then, for fixed 1⩽tT, random graph urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0018 follows an SBM. In other words, for each time t, conditionally on urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0019, random variables urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0020 are independent and the distribution of each urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0021 depends on only urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0022 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0023. For now, we assume a very general parametric form for this distribution on urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0024. Following Ambroise and Matias (2012), to take into account possible sparse weighted graphs, we explicitly introduce a Dirac mass at 0, denoted by urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0025, as a component of this distribution. More precisely, we assume that

(1)
where {F(·,γ),γ ∈ Γ} is a parametric family of distributions with no point mass at 0 and densities (with respect to Lebesgue or counting measure) denoted by f(·,γ). This could be the Gaussian family with unknown mean and variance, the truncated Poisson family on urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0027 (leading to a 0‐inflated or 0‐deflated distribution on the edges of the graph), a finite space distribution on M‐values (a case which comprises non‐parametric approximations of continuous distributions through discretization into a finite number of M bins), etc. The binary case is encompassed in this set‐up with urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0028, namely the parametric family of laws is reduced to a single point, the Dirac mass at 1 and conditional distribution of urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0029 is simply a Bernoulli urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0030 distribution. In what follows and in contrast with the ‘binary case’, we shall call the ‘weighted case’ any set‐up where the set of distributions F is parameterized and not reduced to a single point. Here, the sparsity parameters urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0031 satisfy urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0032, with urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0033 corresponding to the particular case of a complete weighted graph. As a result of considering undirected graphs, the parameters urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0034 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0035 moreover satisfy urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0036 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0037 for all 1⩽q,lQ. For the moment, SBM parameters may be different across time points. We shall return to this point in the next sections. The model is thus parameterized by
and we let urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0039 denote the probability distribution on the whole space urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0040. We also let ϕ(·;β,γ) denote the density of the distribution given by expression (1), namely
where 1{A} is the indicator function of set A. With some abuse of notation and when no confusion occurs, we shorten urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0042 to urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0043 or urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0044. Directed acyclic graphs describing the dependence structure of the variables in the model with different levels of detail are given in Fig. 1. Note that the model assumes that the individuals are present at any time in the data set. An extension that covers the case where some nodes are not present at every time point is given in section S.5 in the on‐line supplementary materials and used in analysing the animal data sets from Section 5.2.

Dependence structures of the model: (a) general view corresponding to a hidden Markov model structure; (b) details on latent structure organization corresponding to N different independent and identically distributed Markov chains urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0045 across individuals; (c) details for fixed time point t corresponding to an SBM structure

2.2 Varying connectivity parameters versus varying group membership

In this section, we give some intuition into why it is not possible to let both connectivity parameters and group membership vary through time without entering label switching issues between time steps. For this, we consider the toy example in Fig. 2.

Connectivity parameters or group membership variation—a toy example: (a) urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0046; (b) urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0047

Fig. 2 shows a graph between N=12 nodes at two different time points urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0048 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0049. Node 1 is a hub (namely a highly connected node), nodes 2–6 form a community at time urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0050 (they tend to form a clique) and are peripheral nodes at time urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0051 and finally nodes 7–12 are peripheral at time urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0052 and become a community at time urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0053. In observing those two graphs (without the clusters indicated by the shading of the nodes), there are at least two possible statistical interpretations relying on a clustering with Q=3 groups. The first (which is illustrated in Fig. 2) is to consider that the three different groups at stake are hubs (in white), a community (light grey) and peripheral nodes (dark grey) and that the nodes 2–6 change group from a community to peripheral group between time urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0054 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0055 whereas nodes 7–12 change from peripheral group to a community between those same time points (node 1 stays a hub, in white, for both time points). Another point of view would rather be to consider that nodes 2–6 stayed in the same group that was organized as a community at time urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0056 and is now characterized by peripheral behaviour at time urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0057, whereas nodes 7–12 also stayed in the same group, behaving peripherally at time urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0058 and now as a community at time urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0059. Obviously neither of these two interpretations is better than the other. Without adding constraints on the model, the label switching phenomenon will randomly output one of these two interpretations (clustering at time urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0060 is the same when permuting light grey and dark grey colours). In this context, it is thus impossible to recover group membership trajectories. We formalize these ideas in the next section through the concept of parameter identifiability.

The main problem with the previous example comes from the possibility of arbitrarily relabelling the groups between two time steps. We mentioned in Section 1 that a natural idea would be that most of the individuals should not change groups between successive time steps. However, imposing constraints on the transition matrix π (for example it has large diagonal elements) is useless because estimation would then be unfeasible. Indeed, without imposing 0‐ values on the off‐diagonal elements of π (i.e. urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0061 does not depend on t), it can happen that there is no labelling of the groups that ensures that most individuals stay in the same group. Thus it is not always possible to label the groups so that, between two successive time steps, estimation of the transition parameters would be constrained to have large diagonal elements. Thus we choose to focus our attention on groups that are characterized through their stable within‐group connectivity parameter. This choice is reminiscent of work on detection of persistent communities (Liu et al., 2014), except that we do not restrict our attention to communities (i.e. groups of highly connected individuals). In this toy example, this corresponds to the first interpretation rather than the second. Other choices could be made and we believe that this one is particularly suited to model social networks or contact data where the groups are defined as structures exhibiting stable within‐group connectivity behaviour and individuals may change groups through time (see Section 5 for applications on real data sets).

2.3 Parameter identifiability

Recall that, with discrete latent random variables, identifiability can only be obtained up to label switching on the node groups urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0062. For any permutation σ in urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0063 (the set of permutations on urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0064) and any θ ∈ Θ, we define

It should be noted that, here, the permutation σ acts globally, meaning that it is the same at each time point t. Now, if we let urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0066 denote the marginal of urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0067 on the set of observations Y, identifiability of the parameterization, up to label switching, means that

Without additional constraints on the transition matrix π or on the parameters (β,γ), the parameters may not be recovered up to label switching. However, it could be that the static SBM part of the parameter is recovered up to local label switching. Local label switching on the SBM part of the parameter is the weaker property

This property is not satisfactory since clustering in models that satisfy only local identifiability of the SBM part of the parameter prevents us from obtaining a picture of the evolution of the groups across time.

A formal example of the fact that, if both urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0070 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0071 may vary through time, then the parameter cannot be identified up to label switching without additional constraints is given in section S.1 in the on‐line supplementary materials. We stress that this example implies that a dynamic affiliation SBM (or planted partition model) does not have identifiable parameters and groups may not be recovered consistently across time. This is an important point as previous researchers have tried to recover groups from this type of synthetic data sets and evaluated their estimated classification in a non‐natural way.

As a consequence and following the ideas that were developed in Section 2.2, we choose to impose the following constraints on the parameter θ:

(2)

Under this condition, we focus on groups that are characterized by stable within‐group connectivity behaviour (urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0073 or urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0074 is constant with time). This constraint could in fact be weakened as follows:

(3)
In this condition, the group l that helps to characterize the group q between the two different time points t and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0076 may depend on q, t and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0077. Such a constraint may be useful if groups are not characterized by stable within‐group connectivity but rather by their connectivity to at least one specific other group. For estimation, this group l needs to be known in advance (for each q, t and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0078) which requires more complex a priori modelling of the data. In what follows, we choose to restrict our attention to constraint (2) only but our theoretical results remain valid under constraint (3). We prove below that these constraints, combined with the same conditions that are used for identifiability in the static case, are sufficient to ensure identifiability of the parameterization in our dynamic set‐up.

Assumption 1. ([weighted case.])We assume that

  1. for any t⩾1, the Q(Q+1)/2 values urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0079 are distinct and
  2. the family of distributions urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0080 is such that all elements f(·,γ) have no point mass at 0 and the parameters of finite mixtures of distributions in urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0081 are identifiable, up to label switching.

Assumption 1 is the condition that ensures identifiability of static weighted SBMs (see theorem 12 in Allman et al. (2011)). Note that it does not impose any constraint on the sparsity parameters urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0082 in the weighted case. In particular and for parsimony, these may be chosen identical (to some urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0083 or some constant β) or set to two different values, e.g. urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0084 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0085 whenever ql at each time point (or even constant with time).

Proposition 2.Considering the distribution urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0086 on the set of observations and assuming constraint (2), the parameter θ=(π,β,γ) satisfies the following conditions.

  1. Binary case: θ is generically identified from urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0087, up to label switching, as soon as N is not too small with respect to Q.
  2. Weighted case: under additional assumption 1, the parameter θ is identified from urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0088, up to label switching, as soon as N⩾3.

Generic identifiability means ‘up to excluding a subset of zero Lebesgue measure of the parameter set’. We refer to Allman et al. (2009, 2011) for more details. In particular for the binary case, assuming that the matrix of Bernoulli parameters β has distinct rows is a generic constraint (meaning that it removes a subset of zero Lebesgue measure of the parameter set). As we do not specify the whole generic constraint that is needed here, we do not stress that one either. But the reader should have it in mind in the binary set‐up. Finally, note that the condition on the number of nodes N being not too small in the binary case is given precisely in theorem 2 from Allman et al. (2011). The particular affiliation case (planted partition) is not covered by these results and is further discussed in section S.2 in the on‐line supplementary materials.

Proof.The proof combines the approaches of Leroux (1992) for proving identifiability of hidden Markov model parameters and Allman et al. (2011) that studies identifiability for (static) SBMs.

First, we fix a time point t⩾1 and consider the marginal distribution urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0089. According to theorems 1 and 2 (binary case with Q=2 and Q⩾3 respectively) and theorem 12 (weighted case) in Allman et al. (2011) on parameter identifiability in static SBMs, there is a permutation urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0090 on the group labels urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0091 such that we can identify urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0092 as well as the marginal distribution α, up to this permutation. This result stands generically in the binary case only.

Now, for two different time points t and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0093, we use constraint (2) and the assumption of distinct parameter values to identify the parameters urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0094 up to a (common) permutation σ on urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0095. Indeed, in the binary case, assuming that the within‐groups Bernoulli parameters satisfy urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0096 and that the set urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0097 contains Q distinct values (a generic constraint) suffices to obtain a global permutation σ, not depending on time t, up to which urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0098 are identified. The same applies in the weighted case, by assuming equality between the parameter urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0099 for any t and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0100.

It remains to identify the transition matrix π (up to the same permutation σ). We fix an edge (i,j) and, following Leroux (1992), consider the bivariate distribution urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0101. This is given by

(4)
Note that Teicher (1967) has proved the equivalence between parameter identifiability for the mixtures of a family of distributions and parameter identifiability for the mixtures of finite products from this same family. For clarity, we develop his proof adapted to our context. We thus write
As the mixtures from the family urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0104 have identifiable parameters (assumption 1, part (b)), we can identify the mixing distribution
Now, applying again this identifiability at time t and constraint (1), we can identify the whole mixing distribution
This proves that the mixture given by equation 4 has identifiable components. From this mixture and the fact that we have already identified the parameters (β,γ) up to a global permutation, we can extract the set of coefficients urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0107 that corresponds to the components urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0108 in equation 4. As we have also already obtained the values urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0109, this now identifies the parameters urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0110. This concludes the proof.

3 Inference algorithm

3.1 General description

As usual with latent variables, the log‐likelihood urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0111 contains a sum over all possible latent configurations Z and thus may not be computed except for small values of N and T. A classical solution is to rely on an EM algorithm (Dempster et al., 1977), which is an iterative procedure that finds local maxima of the log‐likelihood. The use of the EM algorithm relies on the computation of the conditional distribution of the latent variables Z given the observed variables Y. However, in the context of SBMs, this distribution does not have a factored form and thus may not be computed efficiently. A classical solution is to rely on variational approximations of the EM algorithm: the VEM algorithm (see for instance Jordan et al. (1999)). These approximations were first proposed in the context of SBMs in Daudin et al. (2008) and later developed in many directions, such as on‐line procedures (Zanghi et al., 2008, 2010) or Bayesian VEM algorithms (Latouche et al., 2012). We refer to Matias and Robin (2014) for more details about the VEM algorithm (in particular a presentation of the EM algorithm viewed as a special instance of the VEM algorithm) and its comparison with other estimation procedures in SBMs. Note that convergence properties of VEM algorithms are discussed in full generality in Gunawardana and Byrne (2005) and in the special case of SBMs in Celisse et al. (2012) and Bickel et al. (2013).

3.1.1 Variational expectation–maximization for dynamic stochastic block models

In our context of dynamic random graphs, we start by writing the complete‐data log‐likelihood of the model

(5)
We now explore the dependence structure of the conditional distribution urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0113. First, note that it can be easily deduced from the directed acyclic graph of the model (Fig. (1a)) that
However, the distribution urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0115 cannot be further factored. Indeed, for any ij, the variables urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0116 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0117 are not independent when conditioned on urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0118. Our variational approximation naturally considers the following class of probability distributions urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0119 parameterized by τ:
where, for any values urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0121, we have τ(i,q) and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0122 both belong to the set [0,1] and are constrained by urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0123 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0124. This class of probability distributions urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0125 corresponds to considering independent laws through individuals, whereas, for each i ∈ {1,…,N}, the distribution of urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0126 under urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0127 is the distribution of a Markov chain (through time t), with inhomogeneous transition urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0128 and initial distribution urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0129.

We shall need the marginal components of urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0130, namely urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0131. These quantities are computed recursively by

Note also that all the values urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0133 depend on the initial distribution τ(i,q). The entropy of urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0134 is denoted by urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0135. Using this class of probability distributions on urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0136, the VEM algorithm is an iterative procedure to optimize the criterion
(6)
It consists of iterating the following two steps. At the kth iteration, with current parameter value urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0138, we do the following steps.
  1. VE step: compute urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0139.
  2. M‐step: compute urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0140.

Proposition 4.The value urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0141 that maximizes in τ the function J(θ,τ) satisfies the fixed point equation

where ‘∝’ means ‘proportional to’ (the constants are obtained by the constraints on τ). Moreover, the value urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0143 that maximizes in (π,β) the function J(θ,τ) satisfies

The proof of this result is immediate and has been omitted. Note that we have given a formula with constant (through time) values urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0145 for any group urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0146. Whereas this assumption is an identifiability requirement in the binary set‐up, it is not necessary in the weighted case. In the weighted case, we use it only for parsimony. The corresponding formula when this parameter is not assumed to be constant may be easily obtained.

To complete the algorithm's description, we provide equations to update the parameters τ(i,q) and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0147 of initial distributions as well as the connectivity parameter γ. First, optimization of J(θ,τ) with respect to the initialization parameters τ(i,q) is a little more involved. By neglecting the dependence on τ(i,q) of some terms appearing in criterion J, we choose to update this value by solving the fixed point equation

(7)
Our experiments show that this is a reasonable approximation (Section 4). For completeness, we provide in section S.3 in the on‐line supplementary materials the exact equation satisfied by the solution.

Now parameter α is not obtained from maximizing J as it is not a free parameter but rather the stationary distribution associated with transition π. Thus, α is obtained from the empirical mean of the marginal distribution urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0149 over all data points

Finally, optimization with respect to γ depends on the choice of the parametric family {f(·,γ),γ ∈ Γ}. We provide explicit formulae for the most widely used families of conditional distributions on the edges (binary or weighted case) in section S.4 in the on‐line supplementary materials. More precisely, we give these formulae for Bernoulli, finite space, (zero‐inflated or deflated) Poisson and Gaussian homoscedastic distributions.

Remark 5.Performing the EM algorithm in a hidden Markov model (Fig. 1(a)) requires the use of forward–backward equations to deal with transition terms urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0151 appearing in the complete‐data log‐likelihood (5). In our set‐up, forward–backward equations are useless and replaced by a variational approximation. Indeed, it can be seen from Fig. 1(b) that the conditional distribution of urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0152 given the data cannot be computed exactly through such forward–backward equations. This is because the variables urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0153 depend on all hidden variables urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0154 and focusing only on urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0155 is not sufficient to determine their distribution.

Remark 6.Yang et al. (2011) derived a VEM procedure in a similar (slightly less general) set‐up, but their variational approximation uses independent marginals (through individuals and also time points). As a consequence, the VE step that they derive is more involved than ours (see section 4 in Yang et al. (2011)).

3.1.2 Model selection

Model selection on the number of groups Q is an important step. In the case of latent variables, when the true data likelihood may not be easily computed, model selection may be done by maximizing an integrated classification likelihood (ICL) criterion (Biernacki et al., 2000). For any number of groups Q⩾1, let urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0156 be the estimated parameter value with Q groups and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0157 the corresponding maximum a posteriori classification at urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0158. In our case, the general form of the ICL is given by

(8)
where the first penalization term accounts for transition matrix π and pen(N,T,β,γ) is a penalizing term for the connectivity parameters (β,γ). As the number of parameters (β,γ) depends on the specific form of the family {f(·;γ),γ ∈ Γ}, we provide context‐dependent expressions for the ICL in section S.4 in the on‐line supplementary materials (along with the expressions of parameter estimates from the M‐step for each case considered). Note that the first penalization term accounts for N(T−1) latent transitions whereas the number of observations corresponding to the SBM part of the parameter in pen(N,T,β,γ) will be different. We refer to Daudin et al. (2008) for an expression of the ICL in the static SBM that shows an analogous difference in penalizing group proportions or connectivity parameters. Note that there are no theoretical results on the convergence of the ICL procedure (nor in simple mixture models nor in the SBM case). However, the criterion shows very good performances on synthetic experiments and is widely used (see Section 4 for experiments in our set‐up). Nonetheless we mention that the criterion is not suited to the case of a finite space conditional distribution (see example 2 in section S.4 in the supplementary materials for more details).

3.2 Algorithm initialization

All EM‐based procedures look for local maxima of their objective function and careful initialization is a key in their success. For the static SBM, VEM procedures often rely on a k‐means algorithm on the adjacency matrix to obtain an initial clustering of the individuals. In our context, the dynamic aspect of the data needs to be properly handled. We choose to initialize our VEM procedure by running k‐means on the rows of a concatenated data matrix containing all the adjacency time step matrices urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0160 stacked in consecutive column blocks. As a result, our initial clustering of the individuals is constant across time (namely urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0161 does not depend on t). A consequence of this choice is that this initialization works well when the group memberships do not vary too much across time (see Section 4 where we explore different values of transition matrix π). In practice, real life contact networks will either exhibit nodes that do not change group at all (see Section 5) or nodes that leave a group and then come back to this group. Our initialization is efficient in these cases. Another consequence is that, whereas we would expect the performances of the procedure to increase with the number T of time steps, we sometimes observe on the contrary a decrease in these performances. This is because increasing T also increases the probability for an individual to change group at some point in time and thus, starting with a constant‐in‐time clustering of the individuals, it becomes more difficult to infer the group memberships at each time point correctly (see in Section 4 the difference between results for T=5 and T=10).

To conclude this section, we mention that initialization is also a crucial point for other methods and we discuss in the next section its effect on the algorithm that was proposed in Yang et al. (2011).

4 Synthetic experiments

The methods that are presented in this paper are implemented in an R package which is available from http://lbbe.univ-lyon1.fr/dynsbm. Although the complexity of the estimation algorithm is urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0162, the computation time remains acceptable for networks with a few thousand nodes (see the on‐line supplementary Fig. 1).

4.1 Clustering performances

In this section, we explore the performances of our method for clustering the nodes across the different time steps. For this, we shall consider two different criteria. We rely on the adjusted Rand index (ARI) (Hubert and Arabie, 1985) to evaluate the agreement between the estimated and the true latent structure. This index is smaller than 1, two identical latent structures (up to label switching) having an ARI equal to 1. Note that it can take negative values and is built on the Rand index with a correction for chance. Now there are two different ways of using the ARI in a dynamic set‐up. Following Yang et al. (2011) and Xu and Hero (2014) we first consider an averaged value over the different time steps 1⩽tT of ARIurn:x-wiley:13697412:media:rssb12200:rssb12200-math-0163 computed at time t. In this approach the dynamic set‐up may be viewed as a way of improving the node clustering at each time step over a method that would cluster separately the nodes at each time step. However, this averaged index does not say anything about the smooth recovery of group memberships along time. In particular, it is invariant under local switching on the SBM part of the parameter (see Section 2.3). Thus we also consider the global ARI value that compares the clustering of the set of nodes for all time points with the true latent structure. Obviously, good performances for this criterion are more difficult to obtain.

We use synthetic data sets created as follows. We consider binary graphs with N=100 nodes and T ∈ {5;10} different time steps. We assume Q=2 latent groups with three different values for the transition matrix π:

These three cases correspond respectively to low, medium and high group stability. Namely, in the first case, individuals are more likely to change group across time, resulting in a more difficult problem from the point of view of the initialization of our algorithm (see Section 3.2). The stationary distribution in those three cases is urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0165 so the two groups have similar proportions.

As for the Bernoulli parameters β, we explore four different cases (Table 1) representing different levels of difficulty, plus a specific example of affiliation for which we recall that the parameters are not identifiable in the dynamic setting. We note that the affiliation case satisfies the separability condition that was established for sparse planted partition models (see Mossel et al. (2014) for more details). This means that static reconstruction of the groups is conjectured to be possible (and we consider that this static problem corresponds to medium difficulty).

Table 1. Bernoulli parameter values in four cases, plus an affiliation example
Easiness urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0166 urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0167 urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0168
Low− 0.2 0.1 0.15
Low+ 0.25 0.1 0.2
Medium− 0.3 0.1 0.2
Medium+ 0.4 0.1 0.2
Medium with 0.3 0.1 0.3
affiliation

For each combination of (π,β), we generate 100 data sets, estimate their parameters, cluster their nodes and report in Fig. 3 boxplots of a global and of an averaged ARI value.

Boxplots of global ARI (□) and averaged ARI (image) in various set‐ups (in each panel, from left to right, the results correspond to β=low−, low+, medium−, medium+ and the affiliation case respectively): (a) low group stability (urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0169), T=5 time points; (b) low group stability, T=10; (c) medium group stability (urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0170), T=5; (d) medium group stability, T=10; (e) high group stability (urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0171), T=5; (f) high group stability, T=10

Fig. 3 confirms that it is more difficult to obtain a smooth recovery of the groups (measured through the global ARI) than a local recovery (measured through the averaged ARI), the former values being globally smaller than the latter. In particular in the affiliation model, we observe that, whereas the averaged ARI is quite good (all values close to 1), the global ARI can be low (e.g. with low group stability or medium group stability and T=10 time points). However, in the identifiable cases, we obtain good performances for this global index (values above 0.8) when group stability is not too low or when connectivity parameters are sufficiently well separated (medium β‐values). As expected, the clustering performances increase (i.e. ARI values increase) with group stability (from π low to high) and with a better separation between the group connectivity behaviours (from low− to medium+ easiness). When increasing the number of time points from five to 10, clustering indices tend to be slightly larger, exhibiting a smaller variance. However, this is not always so: for instance, with low or medium group stability and β=low+, we observe that the performances decrease from five to 10 time points (smaller ARI values). We believe that this is due to the initialization of our procedure: with T=10 time points, it is more likely that the group memberships differ from their initial value. As we use as a starting point a constant with time value for these memberships, our algorithm is farther from the optimal value.

Mean‐squared errors (MSEs) for estimation of the transition parameter π are given in the on‐line supplementary Fig. 2. We show MSEs for π only, as the MSEs for (β,γ) are strongly correlated with the clustering results. This figure shows that, when groups are not globally recovered, the MSE values may be high (up to 15%). However, in most of the cases, these MSEs are rather small (less than 2%) so that the dynamics of the group memberships is captured.

Now, we compare our results with those of other procedures. The models of Yang et al. (2011) and Xu and Hero (2014) are the closest to our set‐up. Since Xu and Hero (2014) obtained performances that were comparable with those of Yang et al. (2011), we focus on the latter here. (In fact, Xu and Hero's method is faster, with slightly lower clustering performances than that of Yang et al. (2011).) Thus, we use the off‐line version of the algorithm that was proposed in Yang et al. (2011) (MATLAB code is available from homepage.cs.uiowa.edu/∼tymg/codes). We ran their code on the same set‐up as above. When relying on default values of the algorithm, the results obtained are very poor, with ARI values that are smaller than urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0172 in general (the data are not shown). We note that Yang et al. (2011) did not discuss initialization and simply proposed to start with a random partition of the nodes, which proves to be a bad strategy. To make fair comparisons, we thus decided to combine their algorithm with our initialization strategy. Results are presented in the on‐line supplementary Fig. 3.

From Fig. 3 in the supplementary material we can see that, putting apart our initialization strategy, our procedure outperforms that of Yang et al. (2011) (they globally have much lower ARI). Indeed, their method obtains good performances only in a few cases: urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0173, T ∈ {5,10}), urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0174 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0175. In all these cases, we can see that the method's performances are due to very good initialization. Now, when the true classification is farther from initialization, the performances considerably drop. In particular, for intermediate cases (e.g. medium group stability or high group stability with T=10), we can see that our method still succeeds in obtaining a good partition (Fig. 3) whereas this is not so for the method of Yang et al. (2011) (supplementary Fig. 3).

4.2 Model selection

We simulate a binary dynamic data set with Q=4 groups, the transition matrix between states satisfies urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0176 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0177 for ql. Bernoulli parameters are chosen as follows: we draw independent and identically distributed random variables urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0178 and then choose

We generate 100 data sets under this model and estimate the number of groups relying on the ICL criterion. Results are presented in Fig. 4. We observe that the correct number of groups is recovered in 88% of the cases (Fig. 4(a)). Moreover, Fig. 4(b) shows that, when ICL selects only three groups, the ARI of the classification with four groups is rather low (less than 80%). This shows that, in those cases, classification with four groups is not the correct classification, so the VEM algorithm seems responsible for bad results (the optimum has not been reached) more than the penalization term.

Estimation of the number of groups via the ICL criterion: (a) frequency of the selected number of groups; (b) ARI of the classification obtained with four groups depending on the number of groups selected

5 Revealing social structure in dynamic contact networks

Dynamic network analysis has recently emerged as an efficient method for revealing social structure and organization in humans and animals. Indeed, many studies are now beyond the analysis of static networks and take advantage of longitudinal data in the long term, for instance during days or years of observations, that allow for constructing dynamic social networks. In particular, contact networks built from field observations of association between animals or from sensor‐based measurements are now currently available in ecology or sociology. In this section, we show that our statistical approach is a suitable tool to analyse dynamic contact networks from the literature.

5.1 Encounters among high school students

Describing face‐to‐face contacts in a population (in our case, a classroom) can play an important role in

  1. understanding whether there is a peculiar non‐random mixing of individuals that would be a sign for a social organization and
  2. predicting how infectious diseases can spread, by studying the cross‐link between the contact dynamics and the disease dynamics.

As a first step, it is therefore mandatory to find an appropriate model to analyse these contacts and we propose to use our dynamic SBM to achieve this step.

The data set consists of face‐to‐face encounters of high school students (measured through the use of wearable sensors) of a class from a French high school (see Fournet and Barrat (2014) for a complete description of the experiment). In this class called ‘PC’ (as students focus on physics and chemistry), interactions were recorded during 4 days (Tuesday–Friday) in December 2011. We kept only the 27 (out of 31) students who appear every day, i.e. that have at least one interaction with another student during each of the 4 days. Interaction times were aggregated by days to form a sequence of four different networks. These are undirected and weighted networks, the weight of an interaction between two individuals being the number of interactions between these two individuals divided by the number of time points for which at least two individuals interacted; thus a non‐negative real number that we call interaction frequency. After examination of the distribution of these weights, we choose to discretize these data into M=3 bins (see example 2 in section S.4 in the on‐line supplementary materials) corresponding to low, medium and high interaction frequency. As already mentioned, our model selection criterion is not fitted to this case (see section S.4 in the supplementary materials for more details). We thus choose to rely instead on the ‘elbow’ method, applied to the complete‐data log‐likelihood. It consists of identifying a change of slope in the curve that represents this complete‐data log‐likelihood for different values of Q. The method selects Q=4 groups (see supplementary Fig. 4) and we now present the results that were obtained with our model fitted with Q=4 groups.

We observe that groups 2 and 3 are composed of students who are likely to interact (i.e. urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0180 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0181 are close to 1; Fig. 5). Furthermore, the frequency of their interactions inside their groups is higher than in the rest of the network (urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0182 for q=2,3 in Fig. 5). These two groups form two communities such as defined in Fortunato (2010). Moreover, we observe that both groups include a certain number of individuals (3 and 4 respectively) who permanently stay in the group over time (Fig. 6). These individuals may play the role of ‘social attractors’ or ’core leaders’ around which the other students are likely to gravitate. Group 4 displays a similar pattern of community structure, with much less interaction (intermediate value of urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0183) but also a significant level of interaction with group 2 (Fig. 5). Interestingly, groups 2 and 4 also exchange students over time (see the fluxes between groups in Fig. 6) and this could reflect some co‐operation or affinity between the students of these two groups. Group 1 is quite stable over time (seven permanent members; see Fig. 6) and is characterized by a low rate of interaction inside and outside the group (low urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0184‐values in Fig. 5). It clearly gathers isolated students, but this does not mean that they do not interact with any student; they usually do so, but with a small number of partners. Therefore, we not only reveal evolving communities (such as in Yang et al. (2011)) but we also highlight the dynamics of aloneness inside this class.

Summary of the interaction parameters urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0185 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0186 estimated by our model with Q=4 groups on the data set of interactions in the PC class (Fournet and Barrat, 2014): in each cell (q,l) with 1⩽ql⩽4, there are T=4 bar plots corresponding to the four measurements (Tuesday–Friday); each bar plot represents the distribution of the parameter urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0187 for the three categories of interaction frequency (low, medium and high); the width of each bar plot is proportional to the sparsity parameter urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0188; we recall that, when considering the diagonal cells (q,q), parameters do not depend on t anymore
Alluvial plot showing the dynamics of the group membership estimated by our model on the data set of interactions in the PC class (Fournet and Barrat, 2014): each curve is a flux that represents the move of one or more students from a group to another group (Dik indicates group k for day i); the thickness of the curves is proportional to the number of students and the total height represents the 27 students

We now investigate whether gender differences may help in (a posteriori) explaining or refining the interaction patterns that we reveal. We first note that group 3 is exclusively composed of male students (on‐line supplementary Fig. 5). This observation along with the previous conclusions suggests that group 3 may be a closed or exclusive male community. Meanwhile, some of these male students move to group 1 which is partly composed of a ‘backbone’ of female students who stay in group 1 (supplementary Fig. 5). Moreover, we clearly observe that female students are likely to stay in their group (most of the moves between groups are realized by males), and that a majority of them are in low interacting groups 1 and 4. But no female students move between these two groups, which supports a clear dichotomy pattern in the female organization with respect to male organization. In summary, we show evidence for some gender homophily (see Fournet and Barrat (2014) for a precise definition), i.e. gender is a key factor for explaining the dynamics of the interactions between these young adults.

Lastly, we note that both details captured by our model (say β and γ) are often convergent or correlated in this case, but we note that studying this network with a binary model (i.e. not considering the interaction frequency) does not allow us to capture interesting structure (the data are not shown). Therefore, the presence or absence of an interaction as well as its frequency are important and require explicit modelling such as in our approach.

5.2 Social interactions between animals

Interactions between animals are dynamic processes. How and why the topology of the network changes (or not) over time is of primary interest to understand animal societies. Here we analyse two data sets of animal contact networks: the first dealing with migratory birds (sparrows) and the second with Indian equids (onagers). Both data sets are analysed with the extended model presented in section S.5 of the on‐line supplementary materials.

Sparrows were captured and marked during the winters of 2010–2012 in a small area (the University of California, Santa Cruz Arboretum; Shizuka et al. (2014)). During these three seasons, Shizuka and colleagues recorded bird interactions (into flocks, i.e. individuals in the same place at the same time) and they aggregated their observations by season. They observed 69 birds in total, but there was a significant turnover of birds due to mortality and recruitment and only some of these 69 birds are present at each season (31, 46 and 27 birds in the first, second and third seasons respectively). The data set is therefore composed of T=3 undirected and weighted networks, with specified presence or absence of nodes at each of the three time steps. Edges are weighted by the number of times that pairs of birds have been seen together at the same place and time (if 0, there is no edge). Shizuka et al. (2014) identified reassembly of same communities (as defined previously) across seasons despite the turnover of birds. This stability is due to social preferences across years between individuals that rejoin the community in the same area of the site (Shizuka et al., 2014). Our model is a perfect candidate to fit these observations: indeed, constraints from equation 2 are appropriate in this case where the communities keep existing over time (and therefore the parameters remain stable over time) but the membership is evolving (in particular, due to the presence or absence of birds in the three seasons). As previously, we discretized the edge weights into M=3 bins (low, medium and high interaction level) and we selected Q=4 groups with the ‘elbow’ method (the data are not shown). Examination of the estimated parameters urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0189 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0190 (Fig. 7(a)) reveals that groups 2, 3 and 4 are clear communities (with different intragroup behaviours) that eventually correspond to those revealed by Shizuka et al. (2014). For most of that, our method proposes to gather peripheral birds into group 1. Clearly, we observe some stability across years with individuals staying in communities 2, 3 and 4 over time (see the horizontal fluxes in Fig. 8(a)) and that are joined by incoming birds (see fluxes from the fake group 0 of absent birds in Fig. 8). All these observations confirm the analysis in Shizuka et al. (2014) and demonstrate that our modelling approach is particularly suited to such data sets.

Summary of the interaction parameters urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0191 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0192 estimated by our model with (a) Q=4 groups on the data set of sparrows (Shizuka et al., 2014) and (b) Q=3 groups on the data set of onagers (Rubenstein et al., 2015) (same principle as in Fig. 5)
Alluvial plot showing the dynamics of the group membership estimated by our model on the data sets of interactions between (a) 69 sparrows (Shizuka et al., 2014) and (b) 23 onagers (Rubenstein et al., 2015) (same principle as in Fig. 6 (with Sik and Mik indicating group k at season or month i)): a fake group (group 0) gathers absent animals at a specific time step and fuzzy fluxes represent arrival or departure to or from a group from or to group 0 respectively

Onagers were observed in the Little Rann of Kutch, which is a desert in Gujarat, India (Rubenstein et al., 2015). Each time a herd (group) was encountered, association between each pair of individuals in the group was recorded. We retained the data association of 23 individuals that were present at least once between February and May 2003 and we aggregated interactions by month. The data set contains therefore T=4 undirected and weighted networks, with specified presence or absence of nodes each month. Edges are weighted by the number of times that pairs of onagers belong to the same herd (if 0, there is no edge). Again, we discretized the edge weights into M=3 bins (low, medium and high interaction level) and we selected Q=3 groups. Visual inspection of the estimated parameters (Fig. 7 (b)) shows that cluster 1 gathers peripheral onagers that can actually stay away from the others because predators have been extirpated from this habitat (and so, no collective protection strategy is required; Rubenstein et al. (2015)). Cluster 2 is composed of followers onagers which have some interactions between them and much more with those of group 3 whereas onagers in group 3 form a rich club community (i.e. a clique of hubs as defined in Colizza et al. (2006)), with high values of estimates urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0193 and urn:x-wiley:13697412:media:rssb12200:rssb12200-math-0194. This community is evolving over time by integrating one or two onagers during successive months. Interestingly, the social integration process is revealed and is somewhat hierarchical: previously absent onagers (fake group 0 in Fig. 8(b)) are likely to integrate with group 1, onagers of group 1 can possibly move to the followers group (i.e. group 2) and a few followers can be integrated over time with the central rich club community (group 3). Again, the structure of the onagers social network remains persistent over time (see similar conclusions in Rubenstein et al. (2015)) and our model is therefore particularly adapted and efficient in this case.

Acknowledgements

We thank Tianbao Yang for making his code available from his Web page. We sincerely acknowledge the SocioPatterns collaboration for providing the high school data set (http://www.sociopatterns.org) and especially Alain Barrat for interesting discussions. This work was performed by using the computing facilities of the Centre de Calcul, Laboratoire de Biométrie et Biologie Évolutive–Purn:x-wiley:13697412:media:rssb12200:rssb12200-math-0195le Rhurn:x-wiley:13697412:media:rssb12200:rssb12200-math-0196ne‐Alpes de Bioinformatique.

    Number of times cited: 7

    • , Global spectral clustering in dynamic networks, Proceedings of the National Academy of Sciences, 115, 5, (927)
    • , (Q, S)-distance model and counting algorithms in dynamic distributed systems, International Journal of Distributed Sensor Networks, 14, 1, (155014771875687)
    • , Constrained common cluster based model for community detection in temporal and multiplex networks, Neurocomputing, 275, (768)
    • , Dealing with reciprocity in dynamic stochastic block models, Computational Statistics & Data Analysis
    • , Revealing the hidden structure of dynamic ecological networks, Royal Society Open Science, 4, 6, (170251)
    • , Dynamic stochastic block models: parameter estimation and detection of changes in community structure, Statistics and Computing
    • , Multiple change points detection and clustering in dynamic networks, Statistics and Computing