Summary
Networks with a very large number of nodes appear in many application areas and pose challenges for traditional Gaussian graphical modelling approaches. In this paper, we focus on the estimation of a Gaussian graphical model when the dependence between variables has a block‐wise structure. We propose a penalized likelihood estimation of the inverse covariance matrix, also called Graphical LASSO, applied to block averages of observations, and we derive its asymptotic properties. Monte Carlo experiments, comparing the properties of our estimator with those of the conventional Graphical LASSO, show that the proposed approach works well in the presence of block‐wise dependence structure and that it is also robust to possible model misspecification. We conclude the paper with an empirical study on economic growth and convergence of 1,088 European small regions in the years 1980 to 2012. While requiring a priori information on the block structure – e.g. given by the hierarchical structure of data – our approach can be adopted for estimation and prediction using very large panel data sets. Also, it is particularly useful when there is a problem of missing values and outliers or when the focus of the analysis is on out‐of‐sample prediction.
1. Introduction
Estimation of large covariance matrices and their inverse has several applications in various areas, from economics and finance to health, biology, computer science and engineering. One important technique developed by the statistical and computer science literature is the graphical modelling approach, which aims at exploring the relationships among a set of random variables through their joint distribution. Under this framework, the Gaussian distribution is often assumed and, in this case, the dependence structure is completely determined by the covariance matrix, or, equivalently, by its inverse, where the off‐diagonal elements are proportional to partial correlations Lauritzen, (1996). Specifically, variables i and j are conditionally independent given all other variables, if and only if the
Conditional Gaussian models are known in the spatial econometrics literature as conditional autoregressive (CAR) models, representing data from a given spatial location as a function of data in neighbouring locations; see, e.g. Cressie (1993) and Anselin (2010). In a CAR model, the neighbourhood structure is represented by means of the so‐called spatial weights matrix, usually assumed to be known a priori using information on distance between units, such as the geographic, economic, policy or social distance. It is interesting to observe that the problem of estimating the spatial weights matrix in a CAR model is equivalent to a neighbourhood selection problem in a graphical model; for more details, see Section 5. Hence, the spatial weights matrix for CAR models can be estimated by using methods from the Gaussian graphical modelling literature for estimating inverse covariance matrices. While the spatial econometrics literature has been largely immune to the developments in Gaussian graphical modelling, these methods may be useful for a large number of applications in the social sciences.
In this paper, we consider the case of networks with a very large number of nodes and we focus on the estimation of Gaussian graphical models when the dependency between variables has a block‐wise structure. We assume that units can be split into a set of non‐overlapping groups, or blocks, in such a way that the dependence between units only varies across blocks, instead of individual observations. Hence, rather than estimating the links between each pair of units in the sample, we propose to estimate the dependence (links) between groups of cross‐sectional units. Our approach consists of applying the GLASSO methodology of Friedman et al. (2008) to block‐level averages of observations rather than to single observations. When the size of the group is unity, our method collapses to the conventional GLASSO. A major advantage of this method is that its computational cost is greatly reduced and hence it can be adopted for estimation and prediction using very large, or huge, networks. Our approach is also particularly useful when there is a problem of missing values and outliers or when the focus of the analysis is out‐of‐sample prediction.
There exist several examples where it is reasonable to assume a block‐wise dependence structure between units. In economics, preferences for consumer goods of individuals belonging to the same household may react similarly in response to consumption decisions of neighbouring households. Companies belonging to the same sector of economic activity and located within the same geographical area (e.g. the postcode, the region or the country) tend to behave similarly because they have similar characteristics or face similar opportunities and constraints. Thus, it is reasonable to assume that the way they interact with companies from other sectors and/or geographical areas is similar. A block‐wise dependence structure is also a realistic assumption when the variable of interest displays an explicit hierarchical or group membership structure, namely, clustering of units in an organized fashion, such as students within classrooms, members of a household, General Practitioners in a clinic, etc. This is common, for example, when dealing with large, individual‐level, microeconomic or health data sets. Other examples are in neuroscience, where the networks used to represent brain activity have a hierarchical structure, with billions of neurons connected to each other through hub nodes, called voxels, and with connected voxels forming areas that are again connected with each other Luo, (2015). In biology, regulatory networks are thought to have a hub‐type structure, with groups of genes having a similar dependency structure and regulated by a small number of unobserved proteins (Hao et al., 2012). When the grouping is not fully known a priori, we could use methods that allow us to determine endogenously the optimal grouping of cross‐sectional units, by employing techniques from the clustering literature; see, e.g. Lin and Ng (2012), Bonhomme and Manresa (2015) and Ando and Bai (2016).
Exploitation of a priori information on the group structure of variables is not new in the social interaction literature and in the statistical and graphical modelling literature. Empirical works from the social interaction literature typically assume that an individual reacts to the average of others in a predefined group; see Durlauf and Young (2001) and Blume et al. (2013) for a review. Such an assumption implies that the spatial weights matrix has a group‐membership structure, where the weights are identical for all units belonging to the same group, while they are set to zero for the interaction between units belonging to different groups. Lee and Yu (2007) considered the identification and estimation of interaction effects in the context of a spatial autoregressive model where the spatial weights matrix (and the associated precision matrix) has such a block diagonal structure with equal entries. Note that this is a more restrictive assumption to that used in this paper, as it does not allow for dependences between groups. Nevertheless, this model has been widely adopted in several different areas of the social sciences, such as education (Calvó‐Armengol et al., 2009), labour market outcomes (Bayer et al., 2008), crime Sirakaya, (2006) and welfare participation (Bertrand et al., 2000). Similar models have been proposed by the statistical literature, where mixed effect models are commonly used to represent variables with a hierarchical or known group membership structure Goldstein, (2011). When the random effects are assumed to be correlated, these models lead to a covariance matrix that has a block‐wise structure of the same type that we use in this paper, with equal correlation within groups and equal correlation between any two elements of two specified groups Laird and Ware, (1982). Maximum likelihood approaches are typically used for parameter estimation in these models. In the case of a large number of regressors, penalized approaches based on the L1 penalty are used for estimation and variable selection (Schelldorfer et al., 2014). However, these methods typically require a small number of random effects (blocks).
A number of authors in the literature on graphical modelling have proposed sparse estimation of graphs with a block structure. These methods exploit a priori information on group membership of observations to propose fast, sparse estimation algorithms. Guo et al. (2011) consider a heterogeneous data set where variables, while independent across groups, have a sparse dependency structure within group. The corresponding precision matrix has a block diagonal structure, and the authors propose joint estimation of various blocks by maximizing the corresponding penalized log‐likelihood functions. A similar approach is taken by Mazumder and Hastie (2012), who propose thresholding estimation of a sparse inverse covariance that is a block diagonal matrix of connected components. Wit and Abbruzzo (2015) impose block equality constraints on the parameters of an undirected graphical model to reduce the number of parameters to be estimated. Vinciotti et al. (2016) discuss various forms of block structures for dynamic networks and propose estimation of the associated precision matrix under sparsity and equality constraints on parameters (also known as parameter tying). The inclusion of equality constraints, while reducing the number of parameters, often increases the computational complexity of the estimation procedures. For example, the general block structures considered by Wit and Abbruzzo (2015) and Vinciotti et al. (2016) imply a computational cost of the estimation procedure that is higher compared to the approaches of Guo et al. (2011) and Mazumder and Hastie (2012), where the assumed block structure allows the large GLASSO problem to be split into many, smaller tractable problems.
In this paper, we use block structures with the intent to achieve computational efficiency, allowing us to infer networks of very large dimensions. Differently from Guo et al. (2011) and Mazumder and Hastie (2012), our approach does not need to impose block‐diagonality of the precision matrix. However, we assume that units can be split into groups in such a way that the covariance (and associated precision matrix) only varies across blocks, rather than individual observations.
The rest of the paper is structured as follows. In Section 2, we describe the main features of our graphical model with block‐wise dependence structure, while in Section 3 we propose our estimator based on GLASSO. In Section 4, we run Monte Carlo experiments to investigate the small‐sample properties of the proposed estimator. In Section 5, we carry out an empirical study on the economic growth of a set of small regions in Europe. Finally, in Section 6, we provide some concluding remarks. The Appendix provides the proofs.
We use
2. Block‐Wise Dependence Structure in Huge Networks
3. Block‐Glasso Approach
Step 1. Estimate
by applying the GLASSO to , . This allows us to obtain for , and .
The following theorems derive the asymptotic properties of estimator 13 when both N and T go to infinity.
Suppose all conditions in Theorem 3.1 hold, and that
Theorem 3.2 is a straightforward consequence of the sparsistency theorem of Lam and Fan (2009) applied to
Hence, for
A major advantage of our proposed estimation procedure is that it is considerably faster than the conventional GLASSO for estimating an
Finally, it is important to remark that our approach does not allow us to estimate consistently the precision matrix when this arises from one or more common, pervasive factors. Unobserved common factors occur in time series as a result of global shocks, namely unexpected events that may hit all statistical units, although with different intensities Stock and Watson, (2010). These large‐scale perturbations affect micro‐level population units and are often responsible for observable co‐movements of a large number of time series. We observe that our model is more parsimonious than the common factor specification and may be useful in situations where T is too short to allow for fully unrestricted common effects. However, in a large T setting, in the presence of unobserved common factors, our approach can be applied to de‐factored residuals, after estimating common factors using methods such as principal components Bai, (2003) or the Common Correlated Effects methodology Pesaran, (2006).
3.1. Case of Blocks With Unequal Size
3.2. Allowing for General Intra‐Block Correlation Structure
We can show that, under the condition that
4. Monte Carlo Experiments
In each experiment, we compute the block‐GLASSO and the conventional GLASSO, for all pairs of N and T with
4.1. Results
The results are summarized in Tables 1–6 and Figures 1–2. The results from Table 1 show that, when data have block‐wise dependence structure, our method greatly outperforms the conventional GLASSO for all combinations of N, T and G. In particular, the F1 score and AUC show that block‐GLASSO has higher true positive rates and substantially lower false positive rates, while the EL and FL are always lower for block‐GLASSO, indicating that the latter provides a better estimation of the precision matrix. However, it is interesting to note that when
Properties of block‐GLASSO and conventional GLASSO in model (4.1)–(4.2),
N . | T . | G . | Block‐GLASSO . | Conventional GLASSO . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | . | F1 . | AUC . | EL . | FL . | F1 . | AUC . | EL . | FL . |
50 | 200 | 25 | 0.929 | 0.881 | 2.894 | 0.015 | 0.869 | 0.551 | 15.063 | 0.491 |
50 | 200 | 10 | 0.923 | 0.906 | 0.800 | 0.003 | 0.638 | 0.285 | 19.694 | 0.472 |
50 | 50 | 25 | 0.828 | 0.818 | 6.099 | 0.056 | 0.719 | 0.509 | 27.918 | 0.679 |
50 | 50 | 10 | 0.817 | 0.786 | 1.562 | 0.010 | 0.670 | 0.457 | 27.650 | 0.678 |
50 | 10 | 25 | 0.665 | 0.400 | 13.167 | 0.571 | 0.578 | 0.172 | 43.232 | 0.829 |
50 | 10 | 10 | 0.707 | 0.640 | 3.668 | 0.063 | 0.548 | 0.147 | 65.296 | 0.827 |
100 | 200 | 50 | 0.948 | 0.894 | 6.458 | 0.015 | 0.863 | 0.529 | 35.085 | 0.538 |
100 | 200 | 20 | 0.944 | 0.912 | 1.970 | 0.003 | 0.668 | 0.303 | 45.417 | 0.531 |
100 | 50 | 50 | 0.819 | 0.772 | 12.855 | 0.053 | 0.689 | 0.415 | 61.453 | 0.717 |
100 | 50 | 20 | 0.801 | 0.812 | 3.821 | 0.010 | 0.597 | 0.281 | 84.888 | 0.710 |
100 | 10 | 50 | 0.620 | 0.207 | 26.570 | 0.601 | 0.523 | 0.079 | 86.485 | 0.827 |
100 | 10 | 20 | 0.675 | 0.475 | 8.299 | 0.064 | 0.498 | 0.071 | 135.156 | 0.838 |
Properties of block‐GLASSO and conventional GLASSO in model (4.1)–(4.2),
N . | T . | G . | Block‐GLASSO . | Conventional GLASSO . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | . | F1 . | AUC . | EL . | FL . | F1 . | AUC . | EL . | FL . |
50 | 200 | 25 | 0.929 | 0.881 | 2.894 | 0.015 | 0.869 | 0.551 | 15.063 | 0.491 |
50 | 200 | 10 | 0.923 | 0.906 | 0.800 | 0.003 | 0.638 | 0.285 | 19.694 | 0.472 |
50 | 50 | 25 | 0.828 | 0.818 | 6.099 | 0.056 | 0.719 | 0.509 | 27.918 | 0.679 |
50 | 50 | 10 | 0.817 | 0.786 | 1.562 | 0.010 | 0.670 | 0.457 | 27.650 | 0.678 |
50 | 10 | 25 | 0.665 | 0.400 | 13.167 | 0.571 | 0.578 | 0.172 | 43.232 | 0.829 |
50 | 10 | 10 | 0.707 | 0.640 | 3.668 | 0.063 | 0.548 | 0.147 | 65.296 | 0.827 |
100 | 200 | 50 | 0.948 | 0.894 | 6.458 | 0.015 | 0.863 | 0.529 | 35.085 | 0.538 |
100 | 200 | 20 | 0.944 | 0.912 | 1.970 | 0.003 | 0.668 | 0.303 | 45.417 | 0.531 |
100 | 50 | 50 | 0.819 | 0.772 | 12.855 | 0.053 | 0.689 | 0.415 | 61.453 | 0.717 |
100 | 50 | 20 | 0.801 | 0.812 | 3.821 | 0.010 | 0.597 | 0.281 | 84.888 | 0.710 |
100 | 10 | 50 | 0.620 | 0.207 | 26.570 | 0.601 | 0.523 | 0.079 | 86.485 | 0.827 |
100 | 10 | 20 | 0.675 | 0.475 | 8.299 | 0.064 | 0.498 | 0.071 | 135.156 | 0.838 |
Properties of block‐GLASSO with large N in model (4.1)–(4.2),
N . | T . | G . | F1 . | AUC . | EL . | FL . |
---|---|---|---|---|---|---|
500 | 20 | 50 | 0.657 | 0.421 | 13.757 | 0.011 |
500 | 20 | 100 | 0.656 | 0.248 | 33.660 | 0.029 |
500 | 20 | 250 | 0.616 | 0.092 | 94.290 | 0.168 |
1,000 | 20 | 50 | 0.649 | 0.402 | 12.306 | 0.005 |
1,000 | 20 | 100 | 0.631 | 0.232 | 30.079 | 0.011 |
1,000 | 20 | 250 | 0.613 | 0.094 | 90.020 | 0.040 |
2,000 | 20 | 50 | 0.641 | 0.388 | 11.429 | 0.003 |
2,000 | 20 | 100 | 0.624 | 0.228 | 28.523 | 0.010 |
2,000 | 20 | 250 | 0.609 | 0.090 | 87.742 | 0.009 |
Properties of block‐GLASSO with large N in model (4.1)–(4.2),
N . | T . | G . | F1 . | AUC . | EL . | FL . |
---|---|---|---|---|---|---|
500 | 20 | 50 | 0.657 | 0.421 | 13.757 | 0.011 |
500 | 20 | 100 | 0.656 | 0.248 | 33.660 | 0.029 |
500 | 20 | 250 | 0.616 | 0.092 | 94.290 | 0.168 |
1,000 | 20 | 50 | 0.649 | 0.402 | 12.306 | 0.005 |
1,000 | 20 | 100 | 0.631 | 0.232 | 30.079 | 0.011 |
1,000 | 20 | 250 | 0.613 | 0.094 | 90.020 | 0.040 |
2,000 | 20 | 50 | 0.641 | 0.388 | 11.429 | 0.003 |
2,000 | 20 | 100 | 0.624 | 0.228 | 28.523 | 0.010 |
2,000 | 20 | 250 | 0.609 | 0.090 | 87.742 | 0.009 |
Properties of OLS and GLS estimators of β and of block‐GLASSO applied to regression residuals in model (4.1)–(4.2),
N . | T . | G . | OLS . | GLS . | Block‐GLASSO . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Bias . | RMSE . | Size (%) . | Power (%) . | Bias . | RMSE . | Size (%) . | Power (%) . | F1 . | AUC . | EL . | FL . |
50 | 200 | 25 | 0.000 | 0.013 | 14.80 | 100.0 | 0.000 | 0.008 | 4.80 | 100.0 | 0.928 | 0.881 | 2.915 | 0.015 |
50 | 200 | 10 | 0.000 | 0.016 | 22.00 | 99.60 | 0.001 | 0.008 | 4.60 | 100.0 | 0.918 | 0.898 | 0.794 | 0.003 |
50 | 50 | 25 | 0.000 | 0.030 | 19.60 | 82.00 | 0.000 | 0.019 | 4.40 | 92.40 | 0.828 | 0.818 | 6.137 | 0.059 |
50 | 50 | 10 | −0.003 | 0.036 | 23.60 | 76.80 | 0.002 | 0.016 | 4.40 | 96.00 | 0.811 | 0.829 | 1.654 | 0.012 |
50 | 10 | 25 | 0.005 | 0.056 | 13.60 | 47.20 | 0.000 | 0.048 | 10.40 | 43.60 | 0.667 | 0.401 | 14.332 | 0.924 |
50 | 10 | 10 | −0.020 | 0.083 | 25.20 | 47.20 | 0.002 | 0.038 | 4.40 | 46.40 | 0.703 | 0.633 | 4.173 | 0.113 |
100 | 200 | 50 | 0.000 | 0.009 | 15.60 | 100.0 | −0.001 | 0.005 | 4.90 | 100.0 | 0.948 | 0.895 | 6.520 | 0.015 |
100 | 200 | 20 | 0.000 | 0.011 | 23.60 | 100.0 | 0.000 | 0.006 | 4.50 | 100.0 | 0.937 | 0.912 | 1.990 | 0.002 |
100 | 50 | 50 | −0.001 | 0.017 | 18.40 | 97.20 | −0.001 | 0.012 | 6.00 | 99.20 | 0.818 | 0.774 | 12.809 | 0.060 |
100 | 50 | 20 | 0.002 | 0.026 | 24.80 | 91.20 | 0.000 | 0.011 | 5.40 | 100.0 | 0.798 | 0.806 | 3.925 | 0.012 |
100 | 10 | 50 | 0.002 | 0.044 | 16.80 | 57.20 | −0.001 | 0.034 | 8.00 | 55.60 | 0.623 | 0.207 | 29.313 | 0.983 |
100 | 10 | 20 | 0.001 | 0.064 | 22.40 | 59.20 | 0.001 | 0.033 | 5.20 | 66.40 | 0.672 | 0.470 | 9.252 | 0.111 |
Properties of OLS and GLS estimators of β and of block‐GLASSO applied to regression residuals in model (4.1)–(4.2),
N . | T . | G . | OLS . | GLS . | Block‐GLASSO . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Bias . | RMSE . | Size (%) . | Power (%) . | Bias . | RMSE . | Size (%) . | Power (%) . | F1 . | AUC . | EL . | FL . |
50 | 200 | 25 | 0.000 | 0.013 | 14.80 | 100.0 | 0.000 | 0.008 | 4.80 | 100.0 | 0.928 | 0.881 | 2.915 | 0.015 |
50 | 200 | 10 | 0.000 | 0.016 | 22.00 | 99.60 | 0.001 | 0.008 | 4.60 | 100.0 | 0.918 | 0.898 | 0.794 | 0.003 |
50 | 50 | 25 | 0.000 | 0.030 | 19.60 | 82.00 | 0.000 | 0.019 | 4.40 | 92.40 | 0.828 | 0.818 | 6.137 | 0.059 |
50 | 50 | 10 | −0.003 | 0.036 | 23.60 | 76.80 | 0.002 | 0.016 | 4.40 | 96.00 | 0.811 | 0.829 | 1.654 | 0.012 |
50 | 10 | 25 | 0.005 | 0.056 | 13.60 | 47.20 | 0.000 | 0.048 | 10.40 | 43.60 | 0.667 | 0.401 | 14.332 | 0.924 |
50 | 10 | 10 | −0.020 | 0.083 | 25.20 | 47.20 | 0.002 | 0.038 | 4.40 | 46.40 | 0.703 | 0.633 | 4.173 | 0.113 |
100 | 200 | 50 | 0.000 | 0.009 | 15.60 | 100.0 | −0.001 | 0.005 | 4.90 | 100.0 | 0.948 | 0.895 | 6.520 | 0.015 |
100 | 200 | 20 | 0.000 | 0.011 | 23.60 | 100.0 | 0.000 | 0.006 | 4.50 | 100.0 | 0.937 | 0.912 | 1.990 | 0.002 |
100 | 50 | 50 | −0.001 | 0.017 | 18.40 | 97.20 | −0.001 | 0.012 | 6.00 | 99.20 | 0.818 | 0.774 | 12.809 | 0.060 |
100 | 50 | 20 | 0.002 | 0.026 | 24.80 | 91.20 | 0.000 | 0.011 | 5.40 | 100.0 | 0.798 | 0.806 | 3.925 | 0.012 |
100 | 10 | 50 | 0.002 | 0.044 | 16.80 | 57.20 | −0.001 | 0.034 | 8.00 | 55.60 | 0.623 | 0.207 | 29.313 | 0.983 |
100 | 10 | 20 | 0.001 | 0.064 | 22.40 | 59.20 | 0.001 | 0.033 | 5.20 | 66.40 | 0.672 | 0.470 | 9.252 | 0.111 |
Properties of OLS and GLS estimators of
N . | T . | G . | OLS . | GLS . | Block‐GLASSO . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Bias . | RMSE . | Size (%) . | Power (%) . | Bias . | RMSE . | Size (%) . | Power (%) . | F1 . | AUC . | EL . | FL . |
50 | 200 | 25 | −0.006 | 0.017 | 33.20 | 98.00 | −0.001 | 0.010 | 6.40 | 100.0 | 0.930 | 0.882 | 2.860 | 0.015 |
50 | 200 | 10 | −0.008 | 0.022 | 31.10 | 92.00 | 0.001 | 0.011 | 5.60 | 100.0 | 0.912 | 0.896 | 0.803 | 0.003 |
50 | 50 | 25 | −0.022 | 0.037 | 39.20 | 49.00 | −0.003 | 0.021 | 10.00 | 82.30 | 0.828 | 0.818 | 6.042 | 0.057 |
50 | 50 | 10 | −0.019 | 0.046 | 35.40 | 56.00 | 0.003 | 0.020 | 5.20 | 89.20 | 0.808 | 0.825 | 1.596 | 0.011 |
100 | 200 | 50 | −0.004 | 0.011 | 27.10 | 100.0 | 0.000 | 0.007 | 5.00 | 100.0 | 0.949 | 0.895 | 6.519 | 0.015 |
100 | 200 | 20 | −0.005 | 0.014 | 35.20 | 100.0 | 0.002 | 0.006 | 4.60 | 100.0 | 0.937 | 0.913 | 1.998 | 0.003 |
100 | 50 | 50 | −0.020 | 0.027 | 49.50 | 69.0 | 0.000 | 0.014 | 5.80 | 98.10 | 0.809 | 0.768 | 12.727 | 0.058 |
100 | 50 | 20 | −0.021 | 0.035 | 45.10 | 67.0 | 0.008 | 0.014 | 4.20 | 100.0 | 0.800 | 0.814 | 3.850 | 0.011 |
Properties of OLS and GLS estimators of
N . | T . | G . | OLS . | GLS . | Block‐GLASSO . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Bias . | RMSE . | Size (%) . | Power (%) . | Bias . | RMSE . | Size (%) . | Power (%) . | F1 . | AUC . | EL . | FL . |
50 | 200 | 25 | −0.006 | 0.017 | 33.20 | 98.00 | −0.001 | 0.010 | 6.40 | 100.0 | 0.930 | 0.882 | 2.860 | 0.015 |
50 | 200 | 10 | −0.008 | 0.022 | 31.10 | 92.00 | 0.001 | 0.011 | 5.60 | 100.0 | 0.912 | 0.896 | 0.803 | 0.003 |
50 | 50 | 25 | −0.022 | 0.037 | 39.20 | 49.00 | −0.003 | 0.021 | 10.00 | 82.30 | 0.828 | 0.818 | 6.042 | 0.057 |
50 | 50 | 10 | −0.019 | 0.046 | 35.40 | 56.00 | 0.003 | 0.020 | 5.20 | 89.20 | 0.808 | 0.825 | 1.596 | 0.011 |
100 | 200 | 50 | −0.004 | 0.011 | 27.10 | 100.0 | 0.000 | 0.007 | 5.00 | 100.0 | 0.949 | 0.895 | 6.519 | 0.015 |
100 | 200 | 20 | −0.005 | 0.014 | 35.20 | 100.0 | 0.002 | 0.006 | 4.60 | 100.0 | 0.937 | 0.913 | 1.998 | 0.003 |
100 | 50 | 50 | −0.020 | 0.027 | 49.50 | 69.0 | 0.000 | 0.014 | 5.80 | 98.10 | 0.809 | 0.768 | 12.727 | 0.058 |
100 | 50 | 20 | −0.021 | 0.035 | 45.10 | 67.0 | 0.008 | 0.014 | 4.20 | 100.0 | 0.800 | 0.814 | 3.850 | 0.011 |
Properties of block‐GLASSO and conventional GLASSO in model (4.1)–(4.2): non‐normal errors,
N . | T . | G . | Block‐GLASSO . | Conventional GLASSO . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | . | F1 . | AUC . | EL . | FL . | F1 . | AUC . | EL . | FL . |
50 | 200 | 25 | 0.930 | 0.881 | 26.885 | 0.604 | 0.639 | 0.280 | 66.760 | 0.832 |
50 | 200 | 10 | 0.919 | 0.903 | 34.697 | 0.636 | 0.639 | 0.280 | 66.760 | 0.832 |
50 | 50 | 25 | 0.829 | 0.814 | 26.803 | 0.576 | 0.726 | 0.508 | 43.637 | 0.823 |
50 | 50 | 10 | 0.819 | 0.830 | 34.572 | 0.627 | 0.621 | 0.347 | 67.244 | 0.835 |
50 | 10 | 25 | 0.681 | 0.413 | 26.492 | 0.515 | 0.596 | 0.183 | 44.165 | 0.827 |
50 | 10 | 10 | 0.712 | 0.653 | 33.972 | 0.590 | 0.551 | 0.147 | 67.892 | 0.839 |
100 | 200 | 50 | 0.945 | 0.890 | 51.858 | 0.605 | 0.860 | 0.522 | 84.741 | 0.822 |
100 | 200 | 20 | 0.936 | 0.913 | 69.041 | 0.636 | 0.614 | 0.262 | 133.373 | 0.832 |
100 | 50 | 50 | 0.818 | 0.769 | 51.977 | 0.578 | 0.699 | 0.417 | 85.726 | 0.825 |
100 | 50 | 20 | 0.800 | 0.810 | 68.965 | 0.628 | 0.594 | 0.274 | 134.467 | 0.836 |
100 | 10 | 50 | 0.639 | 0.216 | 51.495 | 0.526 | 0.541 | 0.083 | 86.977 | 0.829 |
100 | 10 | 20 | 0.682 | 0.486 | 67.671 | 0.588 | 0.500 | 0.071 | 135.553 | 0.839 |
Properties of block‐GLASSO and conventional GLASSO in model (4.1)–(4.2): non‐normal errors,
N . | T . | G . | Block‐GLASSO . | Conventional GLASSO . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | . | F1 . | AUC . | EL . | FL . | F1 . | AUC . | EL . | FL . |
50 | 200 | 25 | 0.930 | 0.881 | 26.885 | 0.604 | 0.639 | 0.280 | 66.760 | 0.832 |
50 | 200 | 10 | 0.919 | 0.903 | 34.697 | 0.636 | 0.639 | 0.280 | 66.760 | 0.832 |
50 | 50 | 25 | 0.829 | 0.814 | 26.803 | 0.576 | 0.726 | 0.508 | 43.637 | 0.823 |
50 | 50 | 10 | 0.819 | 0.830 | 34.572 | 0.627 | 0.621 | 0.347 | 67.244 | 0.835 |
50 | 10 | 25 | 0.681 | 0.413 | 26.492 | 0.515 | 0.596 | 0.183 | 44.165 | 0.827 |
50 | 10 | 10 | 0.712 | 0.653 | 33.972 | 0.590 | 0.551 | 0.147 | 67.892 | 0.839 |
100 | 200 | 50 | 0.945 | 0.890 | 51.858 | 0.605 | 0.860 | 0.522 | 84.741 | 0.822 |
100 | 200 | 20 | 0.936 | 0.913 | 69.041 | 0.636 | 0.614 | 0.262 | 133.373 | 0.832 |
100 | 50 | 50 | 0.818 | 0.769 | 51.977 | 0.578 | 0.699 | 0.417 | 85.726 | 0.825 |
100 | 50 | 20 | 0.800 | 0.810 | 68.965 | 0.628 | 0.594 | 0.274 | 134.467 | 0.836 |
100 | 10 | 50 | 0.639 | 0.216 | 51.495 | 0.526 | 0.541 | 0.083 | 86.977 | 0.829 |
100 | 10 | 20 | 0.682 | 0.486 | 67.671 | 0.588 | 0.500 | 0.071 | 135.553 | 0.839 |
Properties of GLS estimators of β and of the flexible block‐GLASSO applied to regression residuals in model (4.1)–(4.2) with general intrablock variation,
N . | T . | G . | GLS (flexible block‐GLASSO) . | GLS (block‐GLASSO) . | Flexible block‐GLASSO . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Bias . | RMSE . | Size (%) . | Power (%) . | Bias . | RMSE . | Size (%) . | Power (%) . | F1 . | AUC . | EL . | FL . |
50 | 200 | 25 | 0.002 | 0.015 | 10.05 | 98.49 | 0.002 | 0.015 | 10.55 | 98.50 | 0.902 | 0.885 | 2.451 | 0.060 |
50 | 200 | 10 | 0.000 | 0.016 | 4.55 | 95.48 | 0.000 | 0.017 | 5.52 | 95.50 | 0.907 | 0.902 | 2.382 | 0.081 |
50 | 50 | 25 | 0.000 | 0.033 | 10.05 | 66.83 | 0.000 | 0.033 | 11.05 | 67.35 | 0.772 | 0.766 | 4.072 | 0.126 |
50 | 50 | 10 | −0.001 | 0.034 | 5.05 | 50.25 | 0.000 | 0.034 | 6.50 | 52.80 | 0.797 | 0.806 | 3.844 | 0.111 |
50 | 10 | 25 | −0.008 | 0.080 | 7.65 | 21.43 | −0.008 | 0.080 | 8.20 | 22.95 | 0.648 | 0.373 | 8.739 | 0.492 |
50 | 10 | 10 | −0.015 | 0.090 | 5.00 | 16.58 | −0.014 | 0.087 | 6.60 | 15.05 | 0.702 | 0.630 | 13.762 | 1.180 |
100 | 200 | 50 | 0.002 | 0.012 | 20.83 | 100.00 | 0.002 | 0.012 | 20.85 | 100.00 | 0.919 | 0.895 | 5.316 | 0.065 |
100 | 200 | 20 | 0.000 | 0.012 | 5.25 | 100.00 | 0.000 | 0.012 | 9.05 | 100.00 | 0.922 | 0.910 | 5.127 | 0.082 |
100 | 50 | 50 | 0.001 | 0.023 | 11.11 | 88.89 | 0.001 | 0.023 | 11.10 | 88.90 | 0.763 | 0.726 | 7.379 | 0.118 |
100 | 50 | 20 | −0.001 | 0.022 | 5.35 | 72.97 | −0.001 | 0.023 | 6.80 | 68.90 | 0.784 | 0.799 | 7.914 | 0.108 |
100 | 10 | 50 | 0.009 | 0.069 | 9.00 | 52.95 | 0.009 | 0.070 | 8.80 | 52.90 | 0.598 | 0.192 | 17.022 | 0.549 |
100 | 10 | 20 | −0.005 | 0.061 | 5.05 | 20.60 | −0.003 | 0.063 | 5.50 | 22.60 | 0.668 | 0.461 | 27.672 | 1.128 |
Properties of GLS estimators of β and of the flexible block‐GLASSO applied to regression residuals in model (4.1)–(4.2) with general intrablock variation,
N . | T . | G . | GLS (flexible block‐GLASSO) . | GLS (block‐GLASSO) . | Flexible block‐GLASSO . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Bias . | RMSE . | Size (%) . | Power (%) . | Bias . | RMSE . | Size (%) . | Power (%) . | F1 . | AUC . | EL . | FL . |
50 | 200 | 25 | 0.002 | 0.015 | 10.05 | 98.49 | 0.002 | 0.015 | 10.55 | 98.50 | 0.902 | 0.885 | 2.451 | 0.060 |
50 | 200 | 10 | 0.000 | 0.016 | 4.55 | 95.48 | 0.000 | 0.017 | 5.52 | 95.50 | 0.907 | 0.902 | 2.382 | 0.081 |
50 | 50 | 25 | 0.000 | 0.033 | 10.05 | 66.83 | 0.000 | 0.033 | 11.05 | 67.35 | 0.772 | 0.766 | 4.072 | 0.126 |
50 | 50 | 10 | −0.001 | 0.034 | 5.05 | 50.25 | 0.000 | 0.034 | 6.50 | 52.80 | 0.797 | 0.806 | 3.844 | 0.111 |
50 | 10 | 25 | −0.008 | 0.080 | 7.65 | 21.43 | −0.008 | 0.080 | 8.20 | 22.95 | 0.648 | 0.373 | 8.739 | 0.492 |
50 | 10 | 10 | −0.015 | 0.090 | 5.00 | 16.58 | −0.014 | 0.087 | 6.60 | 15.05 | 0.702 | 0.630 | 13.762 | 1.180 |
100 | 200 | 50 | 0.002 | 0.012 | 20.83 | 100.00 | 0.002 | 0.012 | 20.85 | 100.00 | 0.919 | 0.895 | 5.316 | 0.065 |
100 | 200 | 20 | 0.000 | 0.012 | 5.25 | 100.00 | 0.000 | 0.012 | 9.05 | 100.00 | 0.922 | 0.910 | 5.127 | 0.082 |
100 | 50 | 50 | 0.001 | 0.023 | 11.11 | 88.89 | 0.001 | 0.023 | 11.10 | 88.90 | 0.763 | 0.726 | 7.379 | 0.118 |
100 | 50 | 20 | −0.001 | 0.022 | 5.35 | 72.97 | −0.001 | 0.023 | 6.80 | 68.90 | 0.784 | 0.799 | 7.914 | 0.108 |
100 | 10 | 50 | 0.009 | 0.069 | 9.00 | 52.95 | 0.009 | 0.070 | 8.80 | 52.90 | 0.598 | 0.192 | 17.022 | 0.549 |
100 | 10 | 20 | −0.005 | 0.061 | 5.05 | 20.60 | −0.003 | 0.063 | 5.50 | 22.60 | 0.668 | 0.461 | 27.672 | 1.128 |
Block‐GLASSO ROC curves: varying values of (a) N, (b) T and (c) G. [Color figure can be viewed at wileyonlinelibrary.com]
Flexible block‐GLASSO, group LASSO and conventional GLASSO: within‐block variation,
Table 3 reports the small‐sample properties of OLS and GLS estimators as well as of the block‐GLASSO. As expected in the case of cross‐sectionally correlated regression errors, the OLS estimator, while having a bias comparable to that of the GLS, has higher RMSE and is oversized for all combinations of N, T and G. Hence, ignoring the network leads to severe over‐rejection of the null hypothesis. Looking at the GLS estimator, its empirical size is close to the nominal size of 5% in most cases, although some size distortions can be observed when
Table 6 shows results when the error covariance matrix displays general intra‐block variation (see 4.4 and 4.5). It is interesting to observe that the empirical size of the GLS estimator of β when ignoring the intra‐block variation (block‐GLASSO) is in some cases still close to the nominal value of 5%. The GLS estimator based on the more general procedure (flexible block‐GLASSO) shows a good performance only for smaller values of G, perhaps because under small G (and hence large M) the covariance of
5. An Empirical Example: Spatial Spillovers in Regional Growth and Convergence in Europe
We use block‐GLASSO for estimating a growth equation in per‐capita gross value‐added and for testing for economic convergence of European regions. The debate on whether there exists convergence in per‐capita input and income across nations is still open, with results obtained that differ depending on the sample period and the regions included, as well as the estimation methods adopted. A number of authors have highlighted the importance of incorporating spatial effects when studying economic growth and regional convergence and have proposed the use of spatial econometric techniques; see, among others, Rey and Montouri (1999), Ertur and Koch (2007) and Cuaresma and Feldkircher (2013). Spatial dependence in regional economic growth is likely to arise from technology spillover across neighbouring regions and from factor mobility, as well as from the presence of spatial heterogeneity Rey and Montouri, (1999). In the presence of spatial dependence in economic growth data, if ignored, estimates of the speed of income convergence across geographical regions will be biased.
We contribute to this literature by estimating a growth equation with spatial spillovers and we use the block‐GLASSO procedure to estimate the spatial weights matrix. We use data on gross value‐added per worker (GVA) for 1,088 NUTS3 observed over the period 1980–2012 in 14 European countries.3 The NUTS classification is a hierarchical system for dividing up the economic territory of the European Union (EU) for the purpose of socio‐economic analysis of the regions and design of EU regional policies. It subdivides the EU territory into regions at the three different levels, NUTS1, NUTS2 and NUTS3, moving from larger to smaller geographical units.
In 5.1,
Table 7 offers some descriptive statistics on the variable under study, at the NUTS3 level. It is interesting to observe that the region with the highest level of per‐capita GVA (159,936 euros) is the London area, while the region with the lowest per‐capita GVA (1,842 euros) is North Portugal, which is also the region with the highest growth in per‐capita GVA (47.183%) over the three‐year time interval.
Descriptive statistics for NUTS3 regions
. | Average . | Std dev. . | Min . | Max . |
---|---|---|---|---|
Per‐capita GVA (euros) | 19,818.3 | 8,817.7 | 1,842.0 | 159,936.1 |
Growth in per‐capita GVA (%) | 5.005 | 7.611 | −63.661 | 47.183 |
Descriptive statistics for NUTS3 regions
. | Average . | Std dev. . | Min . | Max . |
---|---|---|---|---|
Per‐capita GVA (euros) | 19,818.3 | 8,817.7 | 1,842.0 | 159,936.1 |
Growth in per‐capita GVA (%) | 5.005 | 7.611 | −63.661 | 47.183 |
Table 8 reports estimates of growth equations 5.1 and 5.2. The first column provides OLS estimates ignoring the spatial structure of data, while the second and third columns show GLS estimates where contemporaneous correlation is incorporated and estimated by block‐GLASSO. The value of the coefficient of the initial per‐capita GVA of NUTS 3 provinces is negative and significant, showing the presence of (absolute) convergence in all regressions. However, when adopting the GLS approach based on the block‐GLASSO procedure, the coefficient is smaller, leading to lower speed of convergence towards the steady state, and a longer time necessary for the regional economies to cover half of the initial lag from their steady states, when compared to traditional OLS estimation. Goodness of fit for all regressions is low, ranging between 12% and 13%, indicating that some important factors have not been included in the models.
Regression results
. | OLS . | GLS: NUTS1 . | GLS: NUTS2 . |
---|---|---|---|
−0.273* (0.008) | −0.227* (0.009) | −0.221* (0.011) | |
Speed of convergence | 0.106 | 0.086 | 0.083 |
Half‐life | 7.273 | 8.789 | 9.045 |
R2 | 0.121 | 0.133 | 0.134 |
G | – | 80 | 211 |
Percentage of links | – | 36.22 | 17.23 |
Average path length | – | 1.629 | 1.845 |
Graph centrality measures: | |||
Degree | – | 0.126 | 0.065 |
Closeness | – | 0.101 | 0.052 |
Betweenness | – | 0.010 | 0.006 |
Note: NUTS3 regional dummies and time dummies have been included in all regressions. * denotes significant at the 5% level. Standard errors (given in parentheses) robust to unknown heteroscedasticity have been adopted.
Regression results
. | OLS . | GLS: NUTS1 . | GLS: NUTS2 . |
---|---|---|---|
−0.273* (0.008) | −0.227* (0.009) | −0.221* (0.011) | |
Speed of convergence | 0.106 | 0.086 | 0.083 |
Half‐life | 7.273 | 8.789 | 9.045 |
R2 | 0.121 | 0.133 | 0.134 |
G | – | 80 | 211 |
Percentage of links | – | 36.22 | 17.23 |
Average path length | – | 1.629 | 1.845 |
Graph centrality measures: | |||
Degree | – | 0.126 | 0.065 |
Closeness | – | 0.101 | 0.052 |
Betweenness | – | 0.010 | 0.006 |
Note: NUTS3 regional dummies and time dummies have been included in all regressions. * denotes significant at the 5% level. Standard errors (given in parentheses) robust to unknown heteroscedasticity have been adopted.
The lower panel of the table reports the percentage of links, the average path length and a set of centrality measures proposed by graph theory – see Borgatti and Everett (2006) and Freeman (1979) – that are widely used to characterize the compactness of graphs. The average path length is given by the average length of all the shortest paths from or to the vertices in the network, giving an indication of how dense the network is. The graph‐level centrality measures are based on three node‐level centrality indicators (i.e. degree, closeness and betweenness), which characterize different aspects of the relative importance of each node and are commonly used in the applied literature.4 All graph‐level measures vary between zero and one, and assume their highest value when the graph has a star or wheel shape. Looking at the percentage of links, it emerges that, as expected, the estimated networks are quite dense and connected when using either NUTS1 or NUTS2 as blocks. This is confirmed by the average path length, which is very low, being around 1.6–1.8. However, the graph centrality measures are close to zero, indicating that there is no single region dominating all other regions. This is also evident from Figure 3, which shows the adjacency graph resulting from the estimation of model 5.1–5.2 via block‐GLASSO where NUTS1 regions are taken as blocks. We do not report the graph when using NUTS2 regions as blocks, because these are too many. It is interesting to observe that the most connected NUTS1 are also the regions with the highest per‐capita GVA, namely Greater London, Norway and South Netherlands, while the areas with a smaller number of connections are Northern Ireland and northern areas of the United Kingdom, which are also geographically isolated from the other regions. Also, in most cases, regions from the same country are connected, thus supporting previous studies using geographical contiguity or geographical distance as a metric of distance.
Adjacency graph of per‐capita GVA growth: 1980–2012. [Color figure can be viewed at wileyonlinelibrary.com]
6. Concluding Remarks
In the last few years, several methods have been proposed for reducing the dimensionality problem when estimating graphical models. These methods usually exploit a priori information on possible independence between groups of observations. In this paper, we focus on the estimation of a Gaussian graphical model with a large number of variables, where dependence between variables is block‐wise because of, for example, a hierarchical or group membership structure. We propose an estimation strategy based on the GLASSO methodology applied to group averages of observations, and we derive the large‐sample properties of the proposed estimator. Our Monte Carlo experiments show that our proposed estimator greatly outperforms the conventional GLASSO when data have block‐wise dependence. These experiments also show that our procedure is robust to various deviations from block‐wise dependence. For example, the method still delivers valid inference when there is some within‐group variation, or under non‐normal errors. We have shown the usefulness of this procedure on an empirical study of economic convergence of European regions, showing that accounting for block‐wise dependence helps us to better estimate convergence parameters. Although there are many examples in economics where the membership is given, in many others this is not true, making the assumption that the block structure is known a priori too restrictive. One interesting extension of this work would be to determine endogenously the inclusion of a unit in a group as well as the size and number of the groups, following the work by Lin and Ng (2012), Bonhomme and Manresa (2015) and Ando and Bai (2016). Future work should also consider a block‐wise structure for the covariance matrix of a VAR model, within the setting proposed by Barigozzi and Brownlees (2016) and Abegaz and Wit (2013). Finally, while our approach does not allow us to estimate the covariance matrix arising from one or more common pervasive factors, it would be interesting to study the properties of an estimation procedure that first controls for common pervasive factors and then estimates the network structure using de‐factored residuals.
Acknowledgements
F. Moscone and E. Tosetti acknowledge financial support from the Engineering and Physical Sciences Research Council (EPSRC) grant, Semantic Credit Risk Assessment of Business Ecosystems (SCRIBE).
Footnotes
The matrix inversion lemma states that Bernstein, (2005)
When T is short, our approach can be used in combination with methods for estimating short dynamic panels, such as the generalized method of moments by Arellano and Bond (1991).
The countries included in the analysis are: Austria, Belgium, Germany, Denmark, Spain, Finland, France, Ireland, Italy, Netherlands, Norway, Portugal, Sweden and the United Kingdom.
Degree is the number of links for each unit, closeness is the inverse of the average length of the shortest paths to/from all the other vertices in the graph and betweenness is the number of times a node acts as a bridge between other nodes.