Summary

Networks with a very large number of nodes appear in many application areas and pose challenges for traditional Gaussian graphical modelling approaches. In this paper, we focus on the estimation of a Gaussian graphical model when the dependence between variables has a block‐wise structure. We propose a penalized likelihood estimation of the inverse covariance matrix, also called Graphical LASSO, applied to block averages of observations, and we derive its asymptotic properties. Monte Carlo experiments, comparing the properties of our estimator with those of the conventional Graphical LASSO, show that the proposed approach works well in the presence of block‐wise dependence structure and that it is also robust to possible model misspecification. We conclude the paper with an empirical study on economic growth and convergence of 1,088 European small regions in the years 1980 to 2012. While requiring a priori information on the block structure – e.g. given by the hierarchical structure of data – our approach can be adopted for estimation and prediction using very large panel data sets. Also, it is particularly useful when there is a problem of missing values and outliers or when the focus of the analysis is on out‐of‐sample prediction.

1. Introduction

Estimation of large covariance matrices and their inverse has several applications in various areas, from economics and finance to health, biology, computer science and engineering. One important technique developed by the statistical and computer science literature is the graphical modelling approach, which aims at exploring the relationships among a set of random variables through their joint distribution. Under this framework, the Gaussian distribution is often assumed and, in this case, the dependence structure is completely determined by the covariance matrix, or, equivalently, by its inverse, where the off‐diagonal elements are proportional to partial correlations Lauritzen, (1996). Specifically, variables i and j are conditionally independent given all other variables, if and only if the (i,j)th element of the inverse covariance matrix, referred to as the precision matrix, is zero. One result in the Gaussian graphical modelling literature is that there is a one‐to‐one correspondence between the joint Gaussian distribution of a vector of random variables and its conditional Gaussian distribution. Under the latter, the distribution of a variable observed in a certain node, given values observed in all other nodes, depends only on the observations in its neighbourhood; see, e.g. Mardia (1988) and Meinshausen and Buhlmann (2006). Hence, the problem of estimating the (inverse) covariance matrix is equivalent to a neighbourhood selection problem. This observation has led to efficient node‐wise LASSO approaches for sparse high‐dimensional graphs; see, e.g. Meinshausen and Buhlmann (2006) and Peng et al. (2009). In contrast to these approaches, Friedman et al. (2008) have developed the Graphical LASSO (GLASSO) approach, where the inverse covariance matrix is directly estimated via penalized likelihood.

Conditional Gaussian models are known in the spatial econometrics literature as conditional autoregressive (CAR) models, representing data from a given spatial location as a function of data in neighbouring locations; see, e.g. Cressie (1993) and Anselin (2010). In a CAR model, the neighbourhood structure is represented by means of the so‐called spatial weights matrix, usually assumed to be known a priori using information on distance between units, such as the geographic, economic, policy or social distance. It is interesting to observe that the problem of estimating the spatial weights matrix in a CAR model is equivalent to a neighbourhood selection problem in a graphical model; for more details, see Section 5. Hence, the spatial weights matrix for CAR models can be estimated by using methods from the Gaussian graphical modelling literature for estimating inverse covariance matrices. While the spatial econometrics literature has been largely immune to the developments in Gaussian graphical modelling, these methods may be useful for a large number of applications in the social sciences.

In this paper, we consider the case of networks with a very large number of nodes and we focus on the estimation of Gaussian graphical models when the dependency between variables has a block‐wise structure. We assume that units can be split into a set of non‐overlapping groups, or blocks, in such a way that the dependence between units only varies across blocks, instead of individual observations. Hence, rather than estimating the links between each pair of units in the sample, we propose to estimate the dependence (links) between groups of cross‐sectional units. Our approach consists of applying the GLASSO methodology of Friedman et al. (2008) to block‐level averages of observations rather than to single observations. When the size of the group is unity, our method collapses to the conventional GLASSO. A major advantage of this method is that its computational cost is greatly reduced and hence it can be adopted for estimation and prediction using very large, or huge, networks. Our approach is also particularly useful when there is a problem of missing values and outliers or when the focus of the analysis is out‐of‐sample prediction.

There exist several examples where it is reasonable to assume a block‐wise dependence structure between units. In economics, preferences for consumer goods of individuals belonging to the same household may react similarly in response to consumption decisions of neighbouring households. Companies belonging to the same sector of economic activity and located within the same geographical area (e.g. the postcode, the region or the country) tend to behave similarly because they have similar characteristics or face similar opportunities and constraints. Thus, it is reasonable to assume that the way they interact with companies from other sectors and/or geographical areas is similar. A block‐wise dependence structure is also a realistic assumption when the variable of interest displays an explicit hierarchical or group membership structure, namely, clustering of units in an organized fashion, such as students within classrooms, members of a household, General Practitioners in a clinic, etc. This is common, for example, when dealing with large, individual‐level, microeconomic or health data sets. Other examples are in neuroscience, where the networks used to represent brain activity have a hierarchical structure, with billions of neurons connected to each other through hub nodes, called voxels, and with connected voxels forming areas that are again connected with each other Luo, (2015). In biology, regulatory networks are thought to have a hub‐type structure, with groups of genes having a similar dependency structure and regulated by a small number of unobserved proteins (Hao et al., 2012). When the grouping is not fully known a priori, we could use methods that allow us to determine endogenously the optimal grouping of cross‐sectional units, by employing techniques from the clustering literature; see, e.g. Lin and Ng (2012), Bonhomme and Manresa (2015) and Ando and Bai (2016).

Exploitation of a priori information on the group structure of variables is not new in the social interaction literature and in the statistical and graphical modelling literature. Empirical works from the social interaction literature typically assume that an individual reacts to the average of others in a predefined group; see Durlauf and Young (2001) and Blume et al. (2013) for a review. Such an assumption implies that the spatial weights matrix has a group‐membership structure, where the weights are identical for all units belonging to the same group, while they are set to zero for the interaction between units belonging to different groups. Lee and Yu (2007) considered the identification and estimation of interaction effects in the context of a spatial autoregressive model where the spatial weights matrix (and the associated precision matrix) has such a block diagonal structure with equal entries. Note that this is a more restrictive assumption to that used in this paper, as it does not allow for dependences between groups. Nevertheless, this model has been widely adopted in several different areas of the social sciences, such as education (Calvó‐Armengol et al., 2009), labour market outcomes (Bayer et al., 2008), crime Sirakaya, (2006) and welfare participation (Bertrand et al., 2000). Similar models have been proposed by the statistical literature, where mixed effect models are commonly used to represent variables with a hierarchical or known group membership structure Goldstein, (2011). When the random effects are assumed to be correlated, these models lead to a covariance matrix that has a block‐wise structure of the same type that we use in this paper, with equal correlation within groups and equal correlation between any two elements of two specified groups Laird and Ware, (1982). Maximum likelihood approaches are typically used for parameter estimation in these models. In the case of a large number of regressors, penalized approaches based on the L1 penalty are used for estimation and variable selection (Schelldorfer et al., 2014). However, these methods typically require a small number of random effects (blocks).

A number of authors in the literature on graphical modelling have proposed sparse estimation of graphs with a block structure. These methods exploit a priori information on group membership of observations to propose fast, sparse estimation algorithms. Guo et al. (2011) consider a heterogeneous data set where variables, while independent across groups, have a sparse dependency structure within group. The corresponding precision matrix has a block diagonal structure, and the authors propose joint estimation of various blocks by maximizing the corresponding penalized log‐likelihood functions. A similar approach is taken by Mazumder and Hastie (2012), who propose thresholding estimation of a sparse inverse covariance that is a block diagonal matrix of connected components. Wit and Abbruzzo (2015) impose block equality constraints on the parameters of an undirected graphical model to reduce the number of parameters to be estimated. Vinciotti et al. (2016) discuss various forms of block structures for dynamic networks and propose estimation of the associated precision matrix under sparsity and equality constraints on parameters (also known as parameter tying). The inclusion of equality constraints, while reducing the number of parameters, often increases the computational complexity of the estimation procedures. For example, the general block structures considered by Wit and Abbruzzo (2015) and Vinciotti et al. (2016) imply a computational cost of the estimation procedure that is higher compared to the approaches of Guo et al. (2011) and Mazumder and Hastie (2012), where the assumed block structure allows the large GLASSO problem to be split into many, smaller tractable problems.

In this paper, we use block structures with the intent to achieve computational efficiency, allowing us to infer networks of very large dimensions. Differently from Guo et al. (2011) and Mazumder and Hastie (2012), our approach does not need to impose block‐diagonality of the precision matrix. However, we assume that units can be split into groups in such a way that the covariance (and associated precision matrix) only varies across blocks, rather than individual observations.

The rest of the paper is structured as follows. In Section 2, we describe the main features of our graphical model with block‐wise dependence structure, while in Section 3 we propose our estimator based on GLASSO. In Section 4, we run Monte Carlo experiments to investigate the small‐sample properties of the proposed estimator. In Section 5, we carry out an empirical study on the economic growth of a set of small regions in Europe. Finally, in Section 6, we provide some concluding remarks. The Appendix provides the proofs.

We use |λ1(A)||λ2(A)||λn(A)| to denote the eigenvalues of a matrix AMn×n, where Mn×n is the space of n×n matrices. Tr(A) is the trace of AMn×n, while its Frobenius norm is AF=(i,j=1maij2)1/2. K is used for a fixed positive constant that does not depend on N; Sc is used to denote the complement of a set S.

2. Block‐Wise Dependence Structure in Huge Networks

Let yit be the observed data for the ith individual, i=1,2,,N, at time t, with t=1,2,,T, and assume that the N‐dimensional vector yt=(y1t,y2t,,yNt)N(μ,Σ), where Σ is an N×N symmetric and positive definite matrix, independent of t. For ease of exposition, we set μ=0, although this assumption can be relaxed by setting μ to a non‐zero vector depending on a set of strictly or weakly exogenous regressors, including, for example, temporal lags of the dependent variable. Assume that the variables can be split into G non‐overlapping groups, with GN, such that the dependence between individuals belonging to different groups is the same for all individuals belonging to the same group. Suppose, for simplicity, that all groups are of the same size M=N/G, where M is an integer number. Under this assumption, Σ has the following block‐wise structure:
ΣN×N=σ1σ121Mσ1G1Mσ211Mσ2σ2G1MσG11MσG21MσG,
(2.1)
where 1M is an M×M matrix of ones, and
σgM×M=δgσggσggσggδgσggσggσggδg,
(2.2)
where σgg are intra‐group covariances, while δg are group‐specific variances, for g=1,2,,G. Let
ΣGG×G=σ11σ21σ1Gσ12σ22σ2GσG1σGG,ΓGG×G=γ1000γ200γG,
(2.3)
where γg=δgσgg0. Then, Σ can be written in compact form as
Σ=(ΣG1M)+(ΓGIM),
(2.4)
where ΣG is a G×G matrix assumed to be positive definite. If Σ has the above block‐wise structure, then its inverse, namely the precision matrix, is also block‐wise. To show this, rewrite
Σ=(MΣG1M1M)+(ΓG1M1M)(ΓG1M1M)+(ΓGIM)=((MΣG+ΓG)1M1M)+(ΓG(IM1M1M)).
(2.5)
Noting that (1/M)1M and (IM(1/M)1M) are idempotent matrices such that their sum is the identity matrix, we can apply Lemma 2.1 (point (iv)) in Magnus (1982) to obtain
Θ=Σ1=((MΣG+ΓG)11M1M)+(ΓG1(IM1M1M)).
(2.6)
Assuming that the matrix (ΣG+(1/M)ΓG)1 has generic elements φgh, the likelihood function has the simplified expression:
l(θ)ln|MΣG+ΓG|(M1)ln|ΓG|1MTt=1Tg=1G×(1Mh=1Gig:jhyityjtφgh+(M1)igyit2γg1ij:i,jgyityjtγg1).
(2.7)
See Appendix  A for a proof. Below, we propose a penalized maximum likelihood approach to estimate Σ and Θ, which exploits the block‐wise dependence structure and is based on the GLASSO.

3. Block‐Glasso Approach

To propose our estimator, consider the group averages
y¯gt=1Migyit,
(3.1)
and note that, if ytN(0,Σ), where Σ is given by 2.5, then also y¯G,t=(y¯1t,y¯2,,y¯Gt)N(0,ΨG), where ΨG is a G×G, positive definite matrix with elements
ψgh=1M2ig,jhσij=σgh,forgh,
(3.2)
ψgg=1M2i,jgσij=σgg+1Mγg,
(3.3)
or, in matrix form,
ΨG=ΣG+1MΓG.
(3.4)
It follows that we can estimate Σ by applying the GLASSO to the vector of group means, y¯G,t. More specifically, consider the following two‐step procedure.

  • Step 1. Estimate ΦG=ΨG1 by applying the GLASSO to y¯G,t, t=1,2,,T. This allows us to obtain σ̂gh for gh=1,2,,G, and ψ̂gg,g=1,2,,G.

  • Step 2. Estimate γg by exploiting identity 2.4 and 3.4. Noting that E[(1/MT)igt=1Tyit2]=σgg+γg, while E[(1/MT)igt=1Ty¯gt2]=σgg+(1/M)γg, we can consider the following estimator for γ̂g:
    γ̂g=MM1(1MTigt=1Tyit2ψ̂gg),g=1,2,,G.
    (3.5)
    Hence, use 2.6 to recover Θ̂:
    Θ̂=(1MΦ̂G1M1M)+(Γ̂G1(IM1M1M)).
    (3.6)

In Step 1, the estimator that maximizes the penalized likelihood for y¯G,t is
Φ̂G=maxΦG0{ln|ΦG|Tr(SGΦG)ρGg,h=1,ghG|φgh|},
(3.7)
where the maximization is taken over symmetric positive definite matrices, SG is the sample covariance matrix, and ρG is the tuning parameter controlling the degree of the sparsity in the estimated inverse covariance matrix.

The following theorems derive the asymptotic properties of estimator 13 when both N and T go to infinity.  

Theorem 3.1.. Consistency
Let ytN(0,Σ) where Σ has the block structure in 2.5, with ΣG given by 2.3 being a symmetric, positive definite matrix such that λ1(ΣG)<K<. Let g,h=1,ghG1{φgh0}=sG, where φgh are the elements of ΦG. Let Θ̂ be an estimate of Θ following Steps 1 and 2, where ρG=O((lnG/T)), with ρG being the tuning parameter in 3.7. Then, we have
Θ̂ΘF=Op(1M(G+sG)lnGT).
(3.8)

 

Theorem 3.2.. Sparsistency

Suppose all conditions in Theorem 3.1 hold, and that Φ̂GΦG2=O(ηG) where ηG is such that ρG=O((lnG/T)+ηG), with ρG being the tuning parameter in 3.7. Let S={(i,j):ij,θij=0} be the set of indices of all non‐zero off‐diagonal elements in Θ. Then, with probability tending to 1 we have θ̂ij=0 for all i,jSc.

Theorem 3.2 is a straightforward consequence of the sparsistency theorem of Lam and Fan (2009) applied to Φ̂G; see also Rothman et al. (2008) and Guo et al. (2011).

Hence, for Θ̂ to be a good proxy of Θ, G needs to be small (or, equivalently, M large) and ΦG needs to be a sparse matrix, as measured by sG. Note, however, that, from 2.6, the off‐diagonal elements of Θ are proportional to 1/M2. Hence, for fixed G, as M increases the (relative) effect of each individual neighbour on each unit would disappear and, in the limit, the precision matrix would become a diagonal matrix. A similar result has been obtained by Lee (2002) in the context of a spatial autoregressive (SAR) model where each spatial unit is influenced aggregately by a significant portion of other spatial units in the sample. Lee (2002) showed that if each spatial unit in the limit has infinitely many neighbours (which would happen in our case for G fixed and M increasing), then the ordinary least‐squares (OLS) estimator for a SAR model would still be consistent and even asymptotically efficient. In Section 4, we investigate the properties of our estimator for different values of G relative to N.

A major advantage of our proposed estimation procedure is that it is considerably faster than the conventional GLASSO for estimating an N×N precision matrix. Using the algorithm proposed by Friedman et al. (2008), the computational cost associated with a coordinate descendent update would decrease from O(N2) to O(G2). This could decrease further to O(G) using faster algorithms, such as QUIC (Hsieh et al., 2014). Another advantage of our approach is that using block averages rather than single observations greatly helps in the presence of missing values, a common problem in statistical analysis. Exploiting group membership information is also very useful for prediction purposes on a hold‐out sample of units, for which the position in the (individual‐level) network is usually unknown. It is important, however, to remark that our approach requires a priori information on the block structure. If this is not available, then one could exploit methods from the clustering literature that allow us to determine endogenously the optimal grouping of cross‐sectional units, such as the k‐means algorithm Forgy, (1965) extended to allow for covariates in the model; see, in particular, Lin and Ng (2012) and Bonhomme and Manresa (2015), and also Ando and Bai (2016). Our approach also has potential application in the area of spatial econometrics. Given the equivalence between CAR models and the joint Gaussian distribution emphasized by many authors – see, among others, Mardia (1988) and Meinshausen and Buhlmann (2006) – this method provides a means for estimating spatial weights matrices in the context of very large panel data. Later in the paper, we offer a small empirical exercise using CAR models.

Finally, it is important to remark that our approach does not allow us to estimate consistently the precision matrix when this arises from one or more common, pervasive factors. Unobserved common factors occur in time series as a result of global shocks, namely unexpected events that may hit all statistical units, although with different intensities Stock and Watson, (2010). These large‐scale perturbations affect micro‐level population units and are often responsible for observable co‐movements of a large number of time series. We observe that our model is more parsimonious than the common factor specification and may be useful in situations where T is too short to allow for fully unrestricted common effects. However, in a large T setting, in the presence of unobserved common factors, our approach can be applied to de‐factored residuals, after estimating common factors using methods such as principal components Bai, (2003) or the Common Correlated Effects methodology Pesaran, (2006).

3.1. Case of Blocks With Unequal Size

Suppose now we have blocks with unequal size, so that group g has size Mg, with g=1,2,,G. In this case, group averages in 3.1 are based on Mg observations. By applying recursively the theorem for block matrix inversion – see Bernstein (2005) – it is easy to see that in the case of blocks of unequal size, a block‐wise structure for Σ still implies a block‐wise Θ. In the case of blocks with unequal size, a convenient representation of Σ can be obtained using selection matrices. Let Mmax=maxg=1,2,,G{Mg} and consider
ΣMmax=(ΣG1Mmax)+(ΓGIMmax).
(3.9)
Then Σ can be extracted as follows:
Σ=SΣMmaxS.
(3.10)
Here, S is an N×GMmax matrix of zeros and ones, selecting the correct number of rows and columns for each block in ΣMmax, depending on the group size. Note that SS=INT, and rewrite
Σ=(SΣMmaxS+INT)INT=S(ΣMmax+IGMmax)SINT,
(3.11)
where IGMmax is a GMmax×GMmax identity matrix. Using the matrix inversion lemma, we obtain
Θ=Σ1=S(ΣMmax+IGMmax)1SS1SINT,
(3.12)
where
(ΣMmax+IGMmax)1=((MmaxΣG+ΓG+IG)11Mmax1Mmax)+((ΓG+IG)1(IMmax1Mmax1Mmax))
(3.13)
and SS is a diagonal GMmax‐dimensional matrix of zeros and ones.1 Steps 1 and 2 outlined above can still be carried to obtain Φ̂G and Γ̂G, where now TMg observations are used to calculate γ̂g. The resulting Σ̂G can then be plugged into 3.12 and 3.13. From 3.12 and 3.13, it can be seen that consistency and sparsistency of the resulting estimator continue to hold with rates that now depend on N, G and Mmax.

3.2. Allowing for General Intra‐Block Correlation Structure

The approach outlined in Section 3 can be extended to allow for a general, intra‐block correlation matrix at the expense of reducing the computational efficiency. Suppose that
σij=σgg+πij,foralli,jg=1,2,,G,
(3.14)
σij=σgh,forallig,jh,withgh=1,2,,G.
(3.15)
Under this framework, while the covariance between variables of different blocks is constant for all variables belonging to the same block, the intra‐block covariance is allowed to vary across variables. In this case, the covariance matrix can be written as
Σ=(ΣG1M)+Π,
where Π is a block‐diagonal matrix.

We can show that, under the condition that (1/M)kgπik0 for all i, the Π matrix can be estimated by the covariance matrix of yity¯gt for each block. This results in a relatively easy implementation, whereby we first calculate y¯gt and apply the block‐GLASSO outlined in Section 3 to compute Σ̂G. Hence, we calculate the deviations of each value yit from its corresponding group‐level average, namely yity¯gt, and we apply the conventional GLASSO to all yity¯gt for each block, separately. This approach requires that πij, namely the deviations of σij from σgg, are not too large, so that the y¯gt can be used to consistently estimate σgg. The computational complexity of this procedure rises to O(G2)+O(GM2), as it is necessary to estimate G blocks of size M. In the rest of the paper, we refer to this approach as the flexible block‐GLASSO.

4. Monte Carlo Experiments

In this section, we provide Monte Carlo evidence on the properties of the above estimation procedure. We consider the following data‐generating process:
yit=αi+βxit+eit,i=1,2,,N;t=1,2,,T.
(4.1)
Here
xit=0.4xi,t1+vit,t=19,18,,1,0,1,2,,T
(4.2)
with αiIIDN(0,0.5), eitN(0,Σ) and vitN(0,ΣX). In generating xit, we set xi,20=0 and discard the first 20 observations to reduce the effect on estimates of initial values of xit. To generate Σ, we start from ΘG=ΣG1 and assume that its elements, θgh,GBin(1,(3/G)) for g,h=1,,G. We obtain Θ and Σ by applying 2.6, where we assume γGU(0.2,0.5). Letting D be the Choleski decomposition of Σ, namely Σ=DD, we generate et=Dεt, where εt=(ε1t,ε2t,,εNt), with εitIDN(0,1). We generate ΣX following the same procedure. As for β, in a first set of experiments we set β=0, and apply our methodology to yit, to test our procedure when there is no uncertainty regarding the mean of yit. We then set β=1 and apply our methodology to regression residuals after estimating β by OLS. As a robustness check, we carry an additional experiment where errors are non‐normally distributed. In this case, when generating eit, we set εit=(uit1)/2, with uitχ12. Model 4.14.2 has strictly exogenous regressors, an assumption that may not hold in practice. In a further set of experiments, we also consider a dynamic set‐up, where we assume that yit is generated by the first‐order autoregressive model
yit=αi+λyi,t1+eit,i=1,2,,N;t=1,2,,T,
(4.3)
where all elements are generated as above, and λ=0.4.
Finally, we examine the performance of the more general flexible block‐GLASSO approach outlined in Section 3.2 when Σ has a general intra‐block correlation structure. Under this experiment, all parameters are the same as in 4.1 and 4.2, with β=1 and
σij=σgg+πij,foralli,jg=1,2,,G,
(4.4)
σij=σgh,forallig;jh,withgh=1,2,,G.
(4.5)
We generate each block in Π by assuming that its inverse has elements distributed as Bin(1,(3/M)).

In each experiment, we compute the block‐GLASSO and the conventional GLASSO, for all pairs of N and T with N=50 and 100 and T=10, 50 and 200. As for the choice of G, we try G=N/2, and N/5. Each experiment is replicated R=250 times. We also carry out another set of experiments with N much larger than T, and set N=500, 1,000 and 2,000 and T=20. In this set of experiments, given the computational difficulties and poor performance in computing conventional GLASSO for such large networks, we only provide results for the block‐GLASSO. Under the dynamic set‐up 4.3, we only run experiments for large T (i.e. T=50 and 200) to avoid incurring bias of the OLS estimator for short panels.2

A number of statistics are used to assess the performance of our graph estimators. In terms of recovery of the network structure (provided by the non‐zero coefficients in Θ), we consider the receiver operating characteristic (ROC) curve, which plots the true positive rate (percentage of non‐zeros, i.e. links, correctly estimated as non‐zero) versus the false positive rate (percentage of zeros incorrectly estimated as non‐zeros), as the tuning parameter, ρG, varies. We summarize ROC curves by providing the maximum F1 score and the area under the curve (AUC), both averaged across the R replications. The F1 score is defined by (2TP/(2TP+FN+FP)), with TP, FP and FN being the true positive, the false positive and the false negatives (number of non‐zeros incorrectly detected as zeros), respectively. In terms of estimation of the precision matrix, we report the average entropy loss (EL) and the average Frobenius loss (FL), defined by
EL=Tr(Θ1Θ̂)ln|Θ1Θ̂|N,
(4.6)
FL=ΘΘ̂F2ΘF2.
(4.7)
When computing EL and FL, we use the rotation information criterion (RIC) – see Lysen (2009) – to select the optimal regularization parameter (and associated optimal precision matrix). Only for selected combinations of N and T, we also provide graphs with the ROC curves. As for β, we report bias, root mean square error (RMSE), empirical size and power of the OLS estimator of β and the feasible generalized least‐squares (GLS) estimator implemented using Θ̂ as estimate of Θ. In computing the empirical size, we set the nominal size to 5%, while in calculating the power we assume as an alternative hypothesis H1:β=0.95.

4.1. Results

The results are summarized in Tables 16 and Figures 12. The results from Table 1 show that, when data have block‐wise dependence structure, our method greatly outperforms the conventional GLASSO for all combinations of N, T and G. In particular, the F1 score and AUC show that block‐GLASSO has higher true positive rates and substantially lower false positive rates, while the EL and FL are always lower for block‐GLASSO, indicating that the latter provides a better estimation of the precision matrix. However, it is interesting to note that when T=10 and G=N/2 the block‐GLASSO does not perform well relative to other cases, and its properties are much worse than the case T=10 and G=N/5. More generally, Tables 1 and 2 show that for the same pair of N and T, the properties of block‐GLASSO deteriorate as G rises, thus confirming our theoretical results that, holding N and T fixed, the estimation error is higher when G is large or, equivalently, M small. This result is also confirmed by Figure 1, showing the ROC curves for the block‐GLASSO for varying N, T and G. As expected, the performance of the estimator improves as N increases (and hence M) for fixed T and G, and as T increases for fixed N and G, while it deteriorates as G rises, holding N and T constant.

Table 1.

Properties of block‐GLASSO and conventional GLASSO in model (4.1)–(4.2), β=0

NTGBlock‐GLASSOConventional GLASSO
F1AUCELFLF1AUCELFL
50200250.9290.8812.8940.0150.8690.55115.0630.491
50200100.9230.9060.8000.0030.6380.28519.6940.472
5050250.8280.8186.0990.0560.7190.50927.9180.679
5050100.8170.7861.5620.0100.6700.45727.6500.678
5010250.6650.40013.1670.5710.5780.17243.2320.829
5010100.7070.6403.6680.0630.5480.14765.2960.827
100200500.9480.8946.4580.0150.8630.52935.0850.538
100200200.9440.9121.9700.0030.6680.30345.4170.531
10050500.8190.77212.8550.0530.6890.41561.4530.717
10050200.8010.8123.8210.0100.5970.28184.8880.710
10010500.6200.20726.5700.6010.5230.07986.4850.827
10010200.6750.4758.2990.0640.4980.071135.1560.838

Note: F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average Frobenius Loss in 4.7.

Table 1.

Properties of block‐GLASSO and conventional GLASSO in model (4.1)–(4.2), β=0

NTGBlock‐GLASSOConventional GLASSO
F1AUCELFLF1AUCELFL
50200250.9290.8812.8940.0150.8690.55115.0630.491
50200100.9230.9060.8000.0030.6380.28519.6940.472
5050250.8280.8186.0990.0560.7190.50927.9180.679
5050100.8170.7861.5620.0100.6700.45727.6500.678
5010250.6650.40013.1670.5710.5780.17243.2320.829
5010100.7070.6403.6680.0630.5480.14765.2960.827
100200500.9480.8946.4580.0150.8630.52935.0850.538
100200200.9440.9121.9700.0030.6680.30345.4170.531
10050500.8190.77212.8550.0530.6890.41561.4530.717
10050200.8010.8123.8210.0100.5970.28184.8880.710
10010500.6200.20726.5700.6010.5230.07986.4850.827
10010200.6750.4758.2990.0640.4980.071135.1560.838

Note: F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average Frobenius Loss in 4.7.

Table 2.

Properties of block‐GLASSO with large N in model (4.1)–(4.2), β=0

NTGF1AUCELFL
50020500.6570.42113.7570.011
500201000.6560.24833.6600.029
500202500.6160.09294.2900.168
1,00020500.6490.40212.3060.005
1,000201000.6310.23230.0790.011
1,000202500.6130.09490.0200.040
2,00020500.6410.38811.4290.003
2,000201000.6240.22828.5230.010
2,000202500.6090.09087.7420.009

Note: F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average Frobenius loss in 4.7.

Table 2.

Properties of block‐GLASSO with large N in model (4.1)–(4.2), β=0

NTGF1AUCELFL
50020500.6570.42113.7570.011
500201000.6560.24833.6600.029
500202500.6160.09294.2900.168
1,00020500.6490.40212.3060.005
1,000201000.6310.23230.0790.011
1,000202500.6130.09490.0200.040
2,00020500.6410.38811.4290.003
2,000201000.6240.22828.5230.010
2,000202500.6090.09087.7420.009

Note: F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average Frobenius loss in 4.7.

Table 3.

Properties of OLS and GLS estimators of β and of block‐GLASSO applied to regression residuals in model (4.1)–(4.2), β=1

NTGOLSGLSBlock‐GLASSO
BiasRMSESize (%)Power (%)BiasRMSESize (%)Power (%)F1AUCELFL
50200250.0000.01314.80100.00.0000.0084.80100.00.9280.8812.9150.015
50200100.0000.01622.0099.600.0010.0084.60100.00.9180.8980.7940.003
5050250.0000.03019.6082.000.0000.0194.4092.400.8280.8186.1370.059
505010−0.0030.03623.6076.800.0020.0164.4096.000.8110.8291.6540.012
5010250.0050.05613.6047.200.0000.04810.4043.600.6670.40114.3320.924
501010−0.0200.08325.2047.200.0020.0384.4046.400.7030.6334.1730.113
100200500.0000.00915.60100.0−0.0010.0054.90100.00.9480.8956.5200.015
100200200.0000.01123.60100.00.0000.0064.50100.00.9370.9121.9900.002
1005050−0.0010.01718.4097.20−0.0010.0126.0099.200.8180.77412.8090.060
10050200.0020.02624.8091.200.0000.0115.40100.00.7980.8063.9250.012
10010500.0020.04416.8057.20−0.0010.0348.0055.600.6230.20729.3130.983
10010200.0010.06422.4059.200.0010.0335.2066.400.6720.4709.2520.111

Note: In calculating the empirical size, the nominal size is set to 5%, while in calculating the power we assume as an alternative hypothesis H1:β=0.95. F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average Frobenius loss in 4.7.

Table 3.

Properties of OLS and GLS estimators of β and of block‐GLASSO applied to regression residuals in model (4.1)–(4.2), β=1

NTGOLSGLSBlock‐GLASSO
BiasRMSESize (%)Power (%)BiasRMSESize (%)Power (%)F1AUCELFL
50200250.0000.01314.80100.00.0000.0084.80100.00.9280.8812.9150.015
50200100.0000.01622.0099.600.0010.0084.60100.00.9180.8980.7940.003
5050250.0000.03019.6082.000.0000.0194.4092.400.8280.8186.1370.059
505010−0.0030.03623.6076.800.0020.0164.4096.000.8110.8291.6540.012
5010250.0050.05613.6047.200.0000.04810.4043.600.6670.40114.3320.924
501010−0.0200.08325.2047.200.0020.0384.4046.400.7030.6334.1730.113
100200500.0000.00915.60100.0−0.0010.0054.90100.00.9480.8956.5200.015
100200200.0000.01123.60100.00.0000.0064.50100.00.9370.9121.9900.002
1005050−0.0010.01718.4097.20−0.0010.0126.0099.200.8180.77412.8090.060
10050200.0020.02624.8091.200.0000.0115.40100.00.7980.8063.9250.012
10010500.0020.04416.8057.20−0.0010.0348.0055.600.6230.20729.3130.983
10010200.0010.06422.4059.200.0010.0335.2066.400.6720.4709.2520.111

Note: In calculating the empirical size, the nominal size is set to 5%, while in calculating the power we assume as an alternative hypothesis H1:β=0.95. F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average Frobenius loss in 4.7.

Table 4.

Properties of OLS and GLS estimators of λ and of block‐GLASSO applied to regression residuals in model (4.3), λ=0.4

NTGOLSGLSBlock‐GLASSO
BiasRMSESize (%)Power (%)BiasRMSESize (%)Power (%)F1AUCELFL
5020025−0.0060.01733.2098.00−0.0010.0106.40100.00.9300.8822.8600.015
5020010−0.0080.02231.1092.000.0010.0115.60100.00.9120.8960.8030.003
505025−0.0220.03739.2049.00−0.0030.02110.0082.300.8280.8186.0420.057
505010−0.0190.04635.4056.000.0030.0205.2089.200.8080.8251.5960.011
10020050−0.0040.01127.10100.00.0000.0075.00100.00.9490.8956.5190.015
10020020−0.0050.01435.20100.00.0020.0064.60100.00.9370.9131.9980.003
1005050−0.0200.02749.5069.00.0000.0145.8098.100.8090.76812.7270.058
1005020−0.0210.03545.1067.00.0080.0144.20100.00.8000.8143.8500.011

Note: In calculating the empirical size, the nominal size is set to 5%, while in calculating the power we assume as an alternative hypothesis H1:β=0.95. F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average FL in 4.7.

Table 4.

Properties of OLS and GLS estimators of λ and of block‐GLASSO applied to regression residuals in model (4.3), λ=0.4

NTGOLSGLSBlock‐GLASSO
BiasRMSESize (%)Power (%)BiasRMSESize (%)Power (%)F1AUCELFL
5020025−0.0060.01733.2098.00−0.0010.0106.40100.00.9300.8822.8600.015
5020010−0.0080.02231.1092.000.0010.0115.60100.00.9120.8960.8030.003
505025−0.0220.03739.2049.00−0.0030.02110.0082.300.8280.8186.0420.057
505010−0.0190.04635.4056.000.0030.0205.2089.200.8080.8251.5960.011
10020050−0.0040.01127.10100.00.0000.0075.00100.00.9490.8956.5190.015
10020020−0.0050.01435.20100.00.0020.0064.60100.00.9370.9131.9980.003
1005050−0.0200.02749.5069.00.0000.0145.8098.100.8090.76812.7270.058
1005020−0.0210.03545.1067.00.0080.0144.20100.00.8000.8143.8500.011

Note: In calculating the empirical size, the nominal size is set to 5%, while in calculating the power we assume as an alternative hypothesis H1:β=0.95. F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average FL in 4.7.

Table 5.

Properties of block‐GLASSO and conventional GLASSO in model (4.1)–(4.2): non‐normal errors, β=0.

NTGBlock‐GLASSOConventional GLASSO
F1AUCELFLF1AUCELFL
50200250.9300.88126.8850.6040.6390.28066.7600.832
50200100.9190.90334.6970.6360.6390.28066.7600.832
5050250.8290.81426.8030.5760.7260.50843.6370.823
5050100.8190.83034.5720.6270.6210.34767.2440.835
5010250.6810.41326.4920.5150.5960.18344.1650.827
5010100.7120.65333.9720.5900.5510.14767.8920.839
100200500.9450.89051.8580.6050.8600.52284.7410.822
100200200.9360.91369.0410.6360.6140.262133.3730.832
10050500.8180.76951.9770.5780.6990.41785.7260.825
10050200.8000.81068.9650.6280.5940.274134.4670.836
10010500.6390.21651.4950.5260.5410.08386.9770.829
10010200.6820.48667.6710.5880.5000.071135.5530.839

Note: In calculating the empirical size, the nominal size is set to 5%, while in calculating the power we assume as an alternative hypothesis H1:β=0.95. F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average FL in 4.7.

Table 5.

Properties of block‐GLASSO and conventional GLASSO in model (4.1)–(4.2): non‐normal errors, β=0.

NTGBlock‐GLASSOConventional GLASSO
F1AUCELFLF1AUCELFL
50200250.9300.88126.8850.6040.6390.28066.7600.832
50200100.9190.90334.6970.6360.6390.28066.7600.832
5050250.8290.81426.8030.5760.7260.50843.6370.823
5050100.8190.83034.5720.6270.6210.34767.2440.835
5010250.6810.41326.4920.5150.5960.18344.1650.827
5010100.7120.65333.9720.5900.5510.14767.8920.839
100200500.9450.89051.8580.6050.8600.52284.7410.822
100200200.9360.91369.0410.6360.6140.262133.3730.832
10050500.8180.76951.9770.5780.6990.41785.7260.825
10050200.8000.81068.9650.6280.5940.274134.4670.836
10010500.6390.21651.4950.5260.5410.08386.9770.829
10010200.6820.48667.6710.5880.5000.071135.5530.839

Note: In calculating the empirical size, the nominal size is set to 5%, while in calculating the power we assume as an alternative hypothesis H1:β=0.95. F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average FL in 4.7.

Table 6.

Properties of GLS estimators of β and of the flexible block‐GLASSO applied to regression residuals in model (4.1)–(4.2) with general intrablock variation, β=1.

NTGGLS (flexible block‐GLASSO)GLS (block‐GLASSO)Flexible block‐GLASSO
BiasRMSESize (%)Power (%)BiasRMSESize (%)Power (%)F1AUCELFL
50200250.0020.01510.0598.490.0020.01510.5598.500.9020.8852.4510.060
50200100.0000.0164.5595.480.0000.0175.5295.500.9070.9022.3820.081
5050250.0000.03310.0566.830.0000.03311.0567.350.7720.7664.0720.126
505010−0.0010.0345.0550.250.0000.0346.5052.800.7970.8063.8440.111
501025−0.0080.0807.6521.43−0.0080.0808.2022.950.6480.3738.7390.492
501010−0.0150.0905.0016.58−0.0140.0876.6015.050.7020.63013.7621.180
100200500.0020.01220.83100.000.0020.01220.85100.000.9190.8955.3160.065
100200200.0000.0125.25100.000.0000.0129.05100.000.9220.9105.1270.082
10050500.0010.02311.1188.890.0010.02311.1088.900.7630.7267.3790.118
1005020−0.0010.0225.3572.97−0.0010.0236.8068.900.7840.7997.9140.108
10010500.0090.0699.0052.950.0090.0708.8052.900.5980.19217.0220.549
1001020−0.0050.0615.0520.60−0.0030.0635.5022.600.6680.46127.6721.128

Note: In calculating the empirical size, the nominal size is set to 5%, while in calculating the power we assume as an alternative hypothesis H1:β=0.95. F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average FL in 4.7.

Table 6.

Properties of GLS estimators of β and of the flexible block‐GLASSO applied to regression residuals in model (4.1)–(4.2) with general intrablock variation, β=1.

NTGGLS (flexible block‐GLASSO)GLS (block‐GLASSO)Flexible block‐GLASSO
BiasRMSESize (%)Power (%)BiasRMSESize (%)Power (%)F1AUCELFL
50200250.0020.01510.0598.490.0020.01510.5598.500.9020.8852.4510.060
50200100.0000.0164.5595.480.0000.0175.5295.500.9070.9022.3820.081
5050250.0000.03310.0566.830.0000.03311.0567.350.7720.7664.0720.126
505010−0.0010.0345.0550.250.0000.0346.5052.800.7970.8063.8440.111
501025−0.0080.0807.6521.43−0.0080.0808.2022.950.6480.3738.7390.492
501010−0.0150.0905.0016.58−0.0140.0876.6015.050.7020.63013.7621.180
100200500.0020.01220.83100.000.0020.01220.85100.000.9190.8955.3160.065
100200200.0000.0125.25100.000.0000.0129.05100.000.9220.9105.1270.082
10050500.0010.02311.1188.890.0010.02311.1088.900.7630.7267.3790.118
1005020−0.0010.0225.3572.97−0.0010.0236.8068.900.7840.7997.9140.108
10010500.0090.0699.0052.950.0090.0708.8052.900.5980.19217.0220.549
1001020−0.0050.0615.0520.60−0.0030.0635.5022.600.6680.46127.6721.128

Note: In calculating the empirical size, the nominal size is set to 5%, while in calculating the power we assume as an alternative hypothesis H1:β=0.95. F1 is the F1 score, AUC is the area under the ROC, EL is the average EL in 4.6 and FL is the average FL in 4.7.

Figure 1.

Block‐GLASSO ROC curves: varying values of (a) N, (b) T and (c) G. [Color figure can be viewed at wileyonlinelibrary.com]

Figure 2.

Flexible block‐GLASSO, group LASSO and conventional GLASSO: within‐block variation, N=100, T=200. [Color figure can be viewed at wileyonlinelibrary.com]

Table 3 reports the small‐sample properties of OLS and GLS estimators as well as of the block‐GLASSO. As expected in the case of cross‐sectionally correlated regression errors, the OLS estimator, while having a bias comparable to that of the GLS, has higher RMSE and is oversized for all combinations of N, T and G. Hence, ignoring the network leads to severe over‐rejection of the null hypothesis. Looking at the GLS estimator, its empirical size is close to the nominal size of 5% in most cases, although some size distortions can be observed when T=10 and G=N/2, namely, for short panels characterized by the presence of many, small groups. In fact, under this case the block‐GLASSO does not perform well, having small F1 and AUC and large EL and FL, thus confirming our asymptotic results reported in Section 3. Similar results can be observed in Table 4 for the case where the dependent variable is generated by the first‐order autoregressive model 4.3. Under non‐normal errors (Table 5), the block‐GLASSO still performs well in detecting the network, as confirmed by F1 and AUC values similar to those reported in Table 1, although its EL and FL are much higher than in the normal counterpart.

Table 6 shows results when the error covariance matrix displays general intra‐block variation (see 4.4 and 4.5). It is interesting to observe that the empirical size of the GLS estimator of β when ignoring the intra‐block variation (block‐GLASSO) is in some cases still close to the nominal value of 5%. The GLS estimator based on the more general procedure (flexible block‐GLASSO) shows a good performance only for smaller values of G, perhaps because under small G (and hence large M) the covariance of y¯gt better approximates the part of the covariance that is block‐wise. We also remark that the more flexible procedure is computationally much slower than the block‐GLASSO. Figure 2 shows the ROC for the flexible block‐GLASSO and the conventional GLASSO, as well as the group LASSO by Yuan and Lin (2006). The use of a group penalty in the group LASSO encourages the recovery of the block structure, although it does not impose it as in the block‐GLASSO. Because the group LASSO has been developed in the context of regression analysis, we apply it to our model as a neighbourhood selection problem for each node of the network. It is interesting to see from Figure 2 that the group LASSO approach performs less well than the block‐GLASSO, but slightly better than the conventional GLASSO, as the latter does not use any a priori information about the blocks.

5. An Empirical Example: Spatial Spillovers in Regional Growth and Convergence in Europe

We use block‐GLASSO for estimating a growth equation in per‐capita gross value‐added and for testing for economic convergence of European regions. The debate on whether there exists convergence in per‐capita input and income across nations is still open, with results obtained that differ depending on the sample period and the regions included, as well as the estimation methods adopted. A number of authors have highlighted the importance of incorporating spatial effects when studying economic growth and regional convergence and have proposed the use of spatial econometric techniques; see, among others, Rey and Montouri (1999), Ertur and Koch (2007) and Cuaresma and Feldkircher (2013). Spatial dependence in regional economic growth is likely to arise from technology spillover across neighbouring regions and from factor mobility, as well as from the presence of spatial heterogeneity Rey and Montouri, (1999). In the presence of spatial dependence in economic growth data, if ignored, estimates of the speed of income convergence across geographical regions will be biased.

We contribute to this literature by estimating a growth equation with spatial spillovers and we use the block‐GLASSO procedure to estimate the spatial weights matrix. We use data on gross value‐added per worker (GVA) for 1,088 NUTS3 observed over the period 1980–2012 in 14 European countries.3 The NUTS classification is a hierarchical system for dividing up the economic territory of the European Union (EU) for the purpose of socio‐economic analysis of the regions and design of EU regional policies. It subdivides the EU territory into regions at the three different levels, NUTS1, NUTS2 and NUTS3, moving from larger to smaller geographical units.

Standard neo‐classical growth models state that countries will converge to the same level of per‐capita income in the long run, independently of initial conditions, as long as there are diminishing returns to capital and labour and perfect diffusion of technology. Under this framework, poorer countries and regions grow faster than richer countries and a negative relationship between average growth rates and initial income levels is expected. Let yi,t+k=ln(GVAi,t+k/GVAit) be the growth in per‐capita GVA (expressed in euros at 2005 prices) for the NUTS3 region i over a set of non‐overlapping time intervals of length k. Our empirical model is the Gaussian CAR model for yit
E(yi,t+k|yj,t+k,j=1,2,,N,ji)=α+βln(GVAit)+j=1Nwij(yj,t+kαβln(GVAjt)),
(5.1)
Var(yi,t+k|yj,t+k,j=1,2,,N,ji)=σi2,
(5.2)
where we set k=3. Hence, a negative coefficient attached to the variable ln(GVAit) indicate that NUTS3 regions with a low initial level of income grow faster than regions with higher initial levels of income, supporting the hypothesis of absolute convergence. The use of non‐overlapping time intervals is common practice in the cross‐country growth literature, as this would decrease the influence of short‐term shocks and business cycles on economic activity, while revealing long‐run relationships. Compared to longer time intervals, the use of three‐year non‐overlapping intervals allows us to keep a sufficient number of observations to exploit the time dimension of panel data. Following existing studies on spatial interaction effects in regional economic growth models, the inclusion of the spatial lag of the dependent variable (growth rate) amongst the regressors in 5.1 aims at capturing the effect of inter‐regional flows of labour, capital and technology on growth and convergence; see Rey and Montouri (1999), Ertur and Koch (2007) and Cuaresma and Feldkircher (2013).

In 5.1, wij is the (i,j)th element of an N×N matrix, W, known as the spatial weights matrix, such that wii=0. In spatial econometrics, W is often assumed to be known using a priori information (e.g. from economic theory) on how statistical units potentially interact. Spatial weights based on geographical or travel distance, or contiguity have been used for modelling spatial spillovers in the economic growth equation, although this has been pointed out as being unrealistic Cuaresma and Feldkircher, (2013). In this application, we keep W as unknown and estimate it using our block‐GLASSO approach. While the unit of analysis is the NUTS3 region, we take as groups larger geographical areas, given by 80 NUTS1 and then 211 NUTS2 European regions. Other grouping criteria may undoubtedly be suggested, for example by looking at the literature on club convergence – see, among others, Corrado et al. (2005) – or using methods for identifying communities in social networks from the graph modelling literature Freeman, (1979).

It is interesting to observe that 5.1 and 5.2 for the conditional distribution imply the joint normal distribution Besag, (1974).
ytN(μt,Σ),
(5.3)
where Σ=(INW)1Λ, with Λ=diag(σ12,σ22,,σN2) and μ=α+βln(GVAt), provided that (INW) is invertible and (INW)1Λ is symmetric and positive‐definite. The reverse is also true: namely, if ytN(μt,Σ), where Σ is an N×N positive definite matrix, then also 5.1 and 5.2 hold, with
wij=θijθii,
(5.4)
Var(yit|yjt,j=1,2,,n,ji)=θii1;
(5.5)
see Mardia (1988) and Meinshausen and Buhlmann (2006). It follows that the problem of estimating wij in the CAR model 5.15.2 is equivalent to determining whether yit and yjt are conditionally independent, i.e. θij=0. Hence, in this application, we estimate W via Θ by imposing a block structure on Σ (and hence on Θ and W).

Table 7 offers some descriptive statistics on the variable under study, at the NUTS3 level. It is interesting to observe that the region with the highest level of per‐capita GVA (159,936 euros) is the London area, while the region with the lowest per‐capita GVA (1,842 euros) is North Portugal, which is also the region with the highest growth in per‐capita GVA (47.183%) over the three‐year time interval.

Table 7.

Descriptive statistics for NUTS3 regions

AverageStd dev.MinMax
Per‐capita GVA (euros)19,818.38,817.71,842.0159,936.1
Growth in per‐capita GVA (%)5.0057.611−63.66147.183
Table 7.

Descriptive statistics for NUTS3 regions

AverageStd dev.MinMax
Per‐capita GVA (euros)19,818.38,817.71,842.0159,936.1
Growth in per‐capita GVA (%)5.0057.611−63.66147.183

Table 8 reports estimates of growth equations 5.1 and 5.2. The first column provides OLS estimates ignoring the spatial structure of data, while the second and third columns show GLS estimates where contemporaneous correlation is incorporated and estimated by block‐GLASSO. The value of the coefficient of the initial per‐capita GVA of NUTS 3 provinces is negative and significant, showing the presence of (absolute) convergence in all regressions. However, when adopting the GLS approach based on the block‐GLASSO procedure, the coefficient is smaller, leading to lower speed of convergence towards the steady state, and a longer time necessary for the regional economies to cover half of the initial lag from their steady states, when compared to traditional OLS estimation. Goodness of fit for all regressions is low, ranging between 12% and 13%, indicating that some important factors have not been included in the models.

Table 8.

Regression results

OLSGLS: NUTS1GLS: NUTS2
ln(GVAit)−0.273* (0.008)−0.227* (0.009)−0.221* (0.011)
Speed of convergence0.1060.0860.083
Half‐life7.2738.7899.045
R20.1210.1330.134
G80211
Percentage of links36.2217.23
Average path length1.6291.845
Graph centrality measures:
Degree0.1260.065
Closeness0.1010.052
Betweenness0.0100.006

Note: NUTS3 regional dummies and time dummies have been included in all regressions. * denotes significant at the 5% level. Standard errors (given in parentheses) robust to unknown heteroscedasticity have been adopted.

Table 8.

Regression results

OLSGLS: NUTS1GLS: NUTS2
ln(GVAit)−0.273* (0.008)−0.227* (0.009)−0.221* (0.011)
Speed of convergence0.1060.0860.083
Half‐life7.2738.7899.045
R20.1210.1330.134
G80211
Percentage of links36.2217.23
Average path length1.6291.845
Graph centrality measures:
Degree0.1260.065
Closeness0.1010.052
Betweenness0.0100.006

Note: NUTS3 regional dummies and time dummies have been included in all regressions. * denotes significant at the 5% level. Standard errors (given in parentheses) robust to unknown heteroscedasticity have been adopted.

The lower panel of the table reports the percentage of links, the average path length and a set of centrality measures proposed by graph theory – see Borgatti and Everett (2006) and Freeman (1979) – that are widely used to characterize the compactness of graphs. The average path length is given by the average length of all the shortest paths from or to the vertices in the network, giving an indication of how dense the network is. The graph‐level centrality measures are based on three node‐level centrality indicators (i.e. degree, closeness and betweenness), which characterize different aspects of the relative importance of each node and are commonly used in the applied literature.4 All graph‐level measures vary between zero and one, and assume their highest value when the graph has a star or wheel shape. Looking at the percentage of links, it emerges that, as expected, the estimated networks are quite dense and connected when using either NUTS1 or NUTS2 as blocks. This is confirmed by the average path length, which is very low, being around 1.6–1.8. However, the graph centrality measures are close to zero, indicating that there is no single region dominating all other regions. This is also evident from Figure 3, which shows the adjacency graph resulting from the estimation of model 5.15.2 via block‐GLASSO where NUTS1 regions are taken as blocks. We do not report the graph when using NUTS2 regions as blocks, because these are too many. It is interesting to observe that the most connected NUTS1 are also the regions with the highest per‐capita GVA, namely Greater London, Norway and South Netherlands, while the areas with a smaller number of connections are Northern Ireland and northern areas of the United Kingdom, which are also geographically isolated from the other regions. Also, in most cases, regions from the same country are connected, thus supporting previous studies using geographical contiguity or geographical distance as a metric of distance.

Figure 3.

Adjacency graph of per‐capita GVA growth: 1980–2012. [Color figure can be viewed at wileyonlinelibrary.com]

6. Concluding Remarks

In the last few years, several methods have been proposed for reducing the dimensionality problem when estimating graphical models. These methods usually exploit a priori information on possible independence between groups of observations. In this paper, we focus on the estimation of a Gaussian graphical model with a large number of variables, where dependence between variables is block‐wise because of, for example, a hierarchical or group membership structure. We propose an estimation strategy based on the GLASSO methodology applied to group averages of observations, and we derive the large‐sample properties of the proposed estimator. Our Monte Carlo experiments show that our proposed estimator greatly outperforms the conventional GLASSO when data have block‐wise dependence. These experiments also show that our procedure is robust to various deviations from block‐wise dependence. For example, the method still delivers valid inference when there is some within‐group variation, or under non‐normal errors. We have shown the usefulness of this procedure on an empirical study of economic convergence of European regions, showing that accounting for block‐wise dependence helps us to better estimate convergence parameters. Although there are many examples in economics where the membership is given, in many others this is not true, making the assumption that the block structure is known a priori too restrictive. One interesting extension of this work would be to determine endogenously the inclusion of a unit in a group as well as the size and number of the groups, following the work by Lin and Ng (2012), Bonhomme and Manresa (2015) and Ando and Bai (2016). Future work should also consider a block‐wise structure for the covariance matrix of a VAR model, within the setting proposed by Barigozzi and Brownlees (2016) and Abegaz and Wit (2013). Finally, while our approach does not allow us to estimate the covariance matrix arising from one or more common pervasive factors, it would be interesting to study the properties of an estimation procedure that first controls for common pervasive factors and then estimates the network structure using de‐factored residuals.

Acknowledgements

F. Moscone and E. Tosetti acknowledge financial support from the Engineering and Physical Sciences Research Council (EPSRC) grant, Semantic Credit Risk Assessment of Business Ecosystems (SCRIBE).

Footnotes

The matrix inversion lemma states that Bernstein, (2005)

(A+BDC)1=A1A1B(D1+CA1B)1CA1.

When T is short, our approach can be used in combination with methods for estimating short dynamic panels, such as the generalized method of moments by Arellano and Bond (1991).

The countries included in the analysis are: Austria, Belgium, Germany, Denmark, Spain, Finland, France, Ireland, Italy, Netherlands, Norway, Portugal, Sweden and the United Kingdom.

Degree is the number of links for each unit, closeness is the inverse of the average length of the shortest paths to/from all the other vertices in the graph and betweenness is the number of times a node acts as a bridge between other nodes.

References

Abegaz
,
F.
and
E.
Wit
(
2013
).
Sparse time series chain graphical models for reconstructing genetic networks
.
Biostatistics
14
,
586
99
.

Ando
,
T.
and
J.
Bai
(
2016
).
Panel data models with grouped factor structure under unknown group membership
.
Journal of Applied Econometrics
31
,
163
91
.

Anselin
,
L.
(
2010
).
Thirty years of spatial econometrics
.
Papers in Regional Science
89
,
3
25
.

Arellano
,
M.
and
S. R.
Bond
(
1991
).
Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations
.
Review of Economic Studies
58
,
277
97
.

Bai
,
J.
(
2003
).
Inferential theory for factor models of large dimensions
.
Econometrica
71
,
135
71
.

Barigozzi
,
M.
and
C.
Brownlees
(
2016
).
NETS: network estimation for time series
. Working paper, Universität Pompeu Fabra (https://ssrn.com/abstract=2249909).

Bayer
,
P.
,
S.
Ross
and
G.
Topa
(
2008
).
Place of work and place of residence: informal hiring networks and labor market outcomes
.
Journal of Political Economy
116
,
1150
96
.

Bernstein
,
D. S.
(
2005
).
Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems Theory
.
Princeton, NJ
:
Princeton University Press
.

Bertrand
,
M.
,
E. F. P.
Luttermer
and
S.
Mullainathan
(
2000
).
Network effects and welfare cultures
.
Quarterly Journal of Economics
115
,
1019
55
.

Besag
,
A. A. J.
(
1974
).
Spatial interaction and the statistical analysis of lattice systems
.
Journal of the Royal Statistical Society
, Series B
36
,
192
236
.

Blume
,
L. E.
,
W. A.
Brock
,
S. N.
Durlauf
and
R.
Jayaraman
(
2013
).
Linear social interactions models
. NBER Working Paper 19212, National Bureau of Economic Research.

Bonhomme
,
S.
and
E.
Manresa
(
2015
).
Grouped patterns of heterogeneity in panel data
.
Econometrica
83
,
1147
84
.

Borgatti
,
S. P.
and
M. G.
Everett
(
2006
).
A graph‐theoretic perspective on centrality
.
Social Networks
28
,
466
84
.

Calvó‐Armengol
,
A.
,
E.
Patacchini
and
Y.
Zenou
(
2009
).
Peer effects and social networks in education
.
Review of Economic Studies
76
,
1239
67
.

Corrado
,
L.
,
R.
Martin
and
M.
Weeks
(
2005
).
Identifying and interpreting regional convergence clusters across Europe
.
Economic Journal
115
,
C133
60
.

Cressie
,
N.
(
1993
).
Statistics for Spatial Data
.
New York, NY
:
Wiley
.

Cuaresma
,
J. C.
and
M.
Feldkircher
(
2013
).
Spatial filtering, model uncertainty and the speed of income convergence in Europe
.
Journal of Applied Econometrics
28
,
720
41
.

Durlauf
,
S.
and
H. Peyton
Young
(
2001
).
The new social economics
. In
S.
Durlauf
and
H. Peyton
Young
(Eds.),
Social Dynamics
,
1
14
.
Cambridge, MA
:
MIT Press
.

Ertur
,
C.
and
W.
Koch
(
2007
).
Growth, technological interdependence and spatial externalities: theory and evidence
.
Journal of Applied Econometrics
22
,
1033
62
.

Forgy
,
E. W.
(
1965
).
Cluster analysis of multivariate data: efficiency vs interpretability of classifications
.
Biometrics
21
,
768
9
.

Freeman
,
L. C.
(
1979
).
Centrality in social networks: conceptual clarification
.
Social Networks
1
,
215
39
.

Friedman
,
J.
,
T.
Hastie
and
R.
Tibshirani
(
2008
).
Sparse inverse covariance estimation with the graphical LASSO
.
Biostatistics
9
,
432
41
.

Goldstein
,
H.
(
2011
).
Multilevel Statistical Models
(4th ed.).
New York, NY
:
Wiley
.

Guo
,
J.
,
E.
Levina
,
G.
Michailidis
and
J.
Zhu
(
2011
).
Joint estimation of multiple graphical models
.
Biometrika
98
,
1
15
.

Hao
,
D.
,
C.
Ren
and
C.
Li
(
2012
).
Revisiting the variation of clustering coefficient of biological networks suggests new modular structure
.
BMC Systems Biology
6
,
34
.

Hsieh
,
C.
,
M. A.
Sustik
,
I. S.
Dhillon
and
P.
Ravikumar
(
2014
).
QUIC: quadratic approximation for sparse inverse covariance estimation
.
Journal of Machine Learning Research
15
,
2911
47
.

Laird
,
N. M.
and
J. H.
Ware
(
1982
).
Random‐effects models for longitudinal data
.
Biometrics
38
,
963
74
.

Lam
,
C.
and
J.
Fan
(
2009
).
Sparsistency and rates of convergence in large covariance matrix estimation
.
Annals of Statistics
37
,
4254
78
.

Lauritzen
,
S. L.
(
1996
).
Graphical Models
. Oxford Statistical Science Series.
Oxford
:
Oxford University Press
.

Lee
,
L. F.
(
2002
).
Consistency and efficiency of least squares estimation for mixed regressive, spatial autoregressive models
.
Econometric Theory
18
,
252
77
.

Lee
,
L.‐F.
and
J.
Yu
(
2007
).
Near unit root in the spatial autoregressive model
. Working paper, Ohio State University.

Lin
,
C.
and
S.
Ng
(
2012
).
Estimation of panel data models with parameter heterogeneity when group membership is unknown
.
Journal of Econometric Methods
1
,
42
55
.

Luo
,
X.
(
2015
).
A hierarchical graphical model for big inverse covariance estimation with an application to fMRI
. Working paper, Cornell University.

Lysen
,
S.
(
2009
).
Permuted Inclusion Criterion: A Variable Selection Technique
. Unpublished PhD thesis, University of Pennsylvania.

Magnus
,
J. R.
(
1982
).
Multivariate error components analysis of linear and nonlinear regression models by maximum likelihood
.
Journal of Econometrics
19
,
239
85
.

Mardia
,
K. V.
(
1988
).
Multi‐dimensional multivariate Gaussian Markov random fields with application to image processing
.
Journal of Multivariate Analysis
24
,
265
84
.

Mazumder
,
R.
and
T.
Hastie
(
2012
).
Exact covariance thresholding into connected components for large‐scale graphical LASSO
.
Journal of Machine Learning Research
13
,
1436
62
.

Meinshausen
,
N.
and
P.
Buhlmann
(
2006
).
High‐dimensional graphs and variable selection with the LASSO
.
Annals of Statistics
34
,
1436
62
.

Peng
,
J.
,
P.
Wang
,
N.
Zhou
and
J.
Zhu
(
2009
).
Partial correlation estimation by joint sparse regression models
.
Journal of the American Statistical Association
104
,
735
46
.

Pesaran
,
M. H.
(
2006
).
Estimation and inference in large heterogenous panels with multifactor error structure
.
Econometrica
74
,
967
1012
.

Rey
,
S.
and
B.
Montouri
(
1999
).
US regional income convergence: a spatial econometric perspective
.
Regional Studies
33
,
143
56
.

Rothman
,
A. J.
,
P. J.
Bickel
,
E.
Levina
and
J.
Zhu
(
2008
).
Sparse permutation invariant covariance estimation
.
Electronic Journal of Statistics
2
,
494
515
.

Schelldorfer
,
J.
,
L.
Meier
and
P.
Bühlmann
(
2014
).
GLMMLasso: an algorithm for high‐dimensional generalized linear mixed models using L1‐penalization
.
Journal of Computational and Graphical Statistics
23
,
460
77
.

Sirakaya
,
S.
(
2006
).
Recidivism and social interactions
.
Journal of the American Statistical Association
101
,
863
75
.

Stock
,
J.
and
M.
Watson
(
2010
).
Dynamic Factor Models
.
Oxford
:
Oxford University Press
.

Vinciotti
,
V.
,
L.
Augugliaro
,
A.
Abbruzzo
and
E.
Wit
(
2016
).
Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks
.
Statistical Applications in Genetics and Molecular Biology
15
,
193
212
.

Wit
,
E.
and
A.
Abbruzzo
(
2015
).
Factorial graphical models for dynamic networks
.
Network Science
3
,
37
57
.

Yuan
,
M.
and
Y.
Lin
(
2006
).
Model selection and estimation in regression with grouped variables
.
Journal of the Royal Statistical Society
, Series B
68
,
49
67
.

Appendix

Log‐likelihood function for networks with block‐wise dependence structure

Let S be the N×N sample covariance matrix based on a sample of size T from the random vector, y:
S=1Tt=1Ty1t2y1ty2ty1tyNty2ty1ty2t2y2tyNtyNty1tyNt2.
To obtain the log‐likelihood function, we first compute simplified expressions for ln|Θ| and Tr(SΘ), with Θ=Σ1 and Σ given by expression 2.4. Using results in Magnus (1982), we have
ln|Θ|=ln|((MΣG+ΓG)11M1M)+(ΓG1(IM1M1M))|
A.1
=ln|MΣG+ΓG|(M1)ln|ΓG|.
A.2
Letting ΨG=ΣG+(1/M)ΓG and φgh be the generic element of ΦG=ΨG1, we have
Tr(SΘ)=i,j=1Nsijθji=i=1Nsiiθii+i,jg,ijg=1Gsijθji+g,h=1:ghGig:jhsijθji=g=1Gigsii(1M2φgg+M1Mγg1)+g=1Gi,jg:ijsij(1M2φgg1Mγg1)+1M2g,h=1Gig:jh:ghsijφhg=1M2g,h=1Gig,jhsijφgh+M1Mg=1Gigsiiγg11Mg=1Gi,jg:ijsijγg1.
Replacing the expressions for sij, we obtain
Tr(SΘ)=1T1M2t=1Tg,h=1Gig:jhyityjtφgh+M1M1Tt=1Tg=1Gigyit2γg11M1Tt=1Tg=1Gij:i,jgyityjtγg1.
It follows that the likelihood function is
l(θ)ln|MΣG+ΓG|(M1)ln|ΓG|1MTt=1T(1Mg,h=1Gig:jhyityjtφgh+(M1)g=1Gigyit2γg1g=1Gij:i,jgyityjtγg1).

 

Proof of Theorem 3.1:
Note that, from 2.6, and using 3.4, we have
Θ=(1MΦG1M1M)+(ΓG1(IM1M1M)).
Hence, it follows that
Θ̂Θ=(1M(Φ̂GΦG)1M1M)+((Γ̂G1ΓG1)(IM1M1M)).
Noting that, given two matrices A and B, ABF=AFBF – see, e.g. Bernstein (2005), p. 676 – and because (1/M)1MF=(1/M)1MF=1 and IM(1/M)1MF=M1, we have
Θ̂ΘF1MΦ̂GΦGF+M1Γ̂G1ΓG1F.
By Theorem 1 in Rothman et al. (2008), we have
Φ̂GΦGF=Op((G+sG)lnGT);
see also Theorem 1 in Lam and Fan (2009). Further, using the properties of moments of quadratic forms, it is easy to show that γ̂gγg=Op(1/MT), so that
Γ̂G1ΓG1F=Op(GMT).
A.3
It follows that
Θ̂ΘF=Op(1M(G+sG)lnGT)+Op(1MT)=Op(1M(G+sG)lnGT).

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Supplementary data