Abstract
We propose penalized likelihood methods for estimating the concentration matrix in the Gaussian graphical model. The methods lead to a sparse and shrinkage estimator of the concentration matrix that is positive definite, and thus conduct model selection and estimation simultaneously. The implementation of the methods is nontrivial because of the positive definite constraint on the concentration matrix, but we show that the computation can be done effectively by taking advantage of the efficient maxdet algorithm developed in convex optimization. We propose a BIC-type criterion for the selection of the tuning parameter in the penalized likelihood methods. The connection between our methods and existing methods is illustrated. Simulations and real examples demonstrate the competitive performance of the new methods.
1 Introduction
Let X = (X(1), …, X(p)) be a p-dimensional random vector following a multivariate normal distribution with unknown mean μ and nonsingular covariance matrix Σ. Given a random sample X1, …, Xn of X, we wish to estimate the concentration matrix C = Σ−1. Of particular interest is the identification of zero entries in the concentration matrix C = (cij), since a zero entry cij = 0 indicates the conditional independence between the two random variables X(i) and X(j) given all other variables. This is the covariance selection problem (Dempster, 1972) or the model-selection problem in the Gaussian concentration graph model (Cox & Wermuth, 1996).
A Gaussian concentration graph model for the Gaussian random vector X is represented by an undirected graph G = (V, E), where V contains p vertices corresponding to the p coordinates and the edges describe the conditional independence relationships among X(1), …, X(p). The edge between X(i) and X(j) is absent if and only if X(i) and X(j) are independent conditional on the other variables, and corresponds to cij = 0. Thus parameter estimation and model selection in the Gaussian concentration graph model are equivalent to estimating parameters and identifying zeros in the concentration matrix; see Whittaker (1990), Lauritzen (1996) and Edwards (2000) for statistical properties of Gaussian concentration graph models and commonly used model selection and parameter estimation methods in such models.
The standard approach to model selection in Gaussian graphical models is greedy stepwise forward-selection or backward-deletion, and parameter estimation is based on the selected model. In each step the edge selection or deletion is typically done through hypothesis testing at some level α. It has long been recognized that this procedure does not correctly take account of the multiple comparisons involved (Edwards, 2000). Another drawback of the common stepwise procedure is its computational complexity. To remedy these problems, Drton & Perlman (2004) proposed a method that produces conservative simultaneous 1 − α confidence intervals, and uses these confidence intervals to do model selection in a single step. The method is based on asymptotic considerations. Meinshausen & Bühlmann (2006) proposed a computationally attractive method for covariance selection that can be used for very large Gaussian graphs. They perform neighbourhood selection for each node in the graph and combine the results to learn the structure of a Gaussian concentration graph model. They showed that their method is consistent for sparse high-dimensional graphs. In all of the above mentioned methods, model selection and parameter estimation are done separately. The parameters in the concentration matrix are typically estimated based on the model selected. As demonstrated by Breiman (1996), the discrete nature of such procedures often leads to instability of the estimator: small changes in the data may result in very different estimates. Other recent advances include a Duke University discussion paper by A. Dobra and M. West, who presented a novel Bayesian framework for building Gaussian graphical models and illustrated their approach in a large scale gene expression study, and Li & Gui (2006), who adopted gradient-directed regularization, which is described in a technical report by J. Friedman and B. Popescu, available at http://www-stat.stanford.edu/∼jhf, to construct sparse Gaussian graphical models.
In this paper, we propose a penalized-likelihood method that does model selection and parameter estimation simultaneously in the Gaussian concentration graph model. We employ an ℓ1 penalty on the off-diagonal elements of the concentration matrix. This is similar to the idea of the lasso in linear regression (Tibshirani, 1996). The ℓ1 penalty encourages sparsity and at the same time gives shrinkage estimates. In addition, we explicitly ensure that the estimator of the concentration matrix is positive definite. We also introduce a ‘nonnegative garrote’ type method that is closely related to the aforementioned approach.
There is a connection between the neighbourhood-selection method in Meinshausen & Bühlmann (2006) and our penalized-likelihood approach, which we illustrate in 5. The neighbourhood-selection method can be cast as a penalized M-estimation without incorporating the positive definiteness or symmetry constraint. The loss function in the penalized M-estimation is a particular quadratic form. The neighbourhood selection method is computationally faster because of its simpler form and because it does not consider the positive definite constraint. Our method is more efficient because of the incorporation of the positive definite constraint and the use of likelihood.
Throughout the paper we assume that the observations are suitably centred and scaled. The sample mean is centred to be zero. One may scale to have the diagonal elements of the sample covariance matrix equal to one or to have the diagonal elements of the sample concentration matrix equal to one. In our experience these two scalings give very similar performance, and in this paper we assume the latter since it seems to be more natural for estimating the concentration matrix.
2 Methodology
2·1 Lasso-type estimator
2·2 Nonnegative garrote-type estimator
2·3 Illustration
Similarly, the garrote type estimator can also be obtained in an explicit form in this case.
The estimators are illustrated Fig. 1. If the true value of c12 is zero, r will tend to be small in magnitude. With an appropriately chosen λ, both estimates of c12 can be shrunk to zero, so that model selection for the graphical model can be achieved. Note that the proposed estimators are continuous functions of r and consequently of the data. Such continuity, not shared by the existing methods that perform maximum likelihood estimation on a selected graph structure, ensures the stability of our estimators. The garrote-type estimator penalizes large r’s less heavily than small r’s. As will be demonstrated in the next section, this can be advantageous for model-fitting. However, the disadvantage of the garrote-type estimator is that it can only be applied when a good initial estimator is available.
(a) Lasso-type estimator, (b) garrote type estimator for the case of p = 2.
3 Asymptotic Theory
In this section, we derive asymptotic properties of the proposed estimators that are analogous to those for the lasso (Knight & Fu, 2000). For simplicity, we assume that p is held fixed as the sample size . Although it might be more realistic to consider the case when
as
, the following results nevertheless provide an adequate illustration of the mechanism of the proposed estimators.
We simulated 1000 observations from the distribution of arg min V. Figure 2 gives the scatterplot of the off-diagonal elements for λ0 = 0, 0·5 and 1. When λ0 = 0, our estimator is asymptotically equivalent to the maximum likelihood estimator, and the asymptotic distribution for the elements of C is multivariate normal; see Fig. 2(a). If λ0 > 0, the proposed estimator will have a positive probability of estimating c13 by its true value 0, and this probability increases as λ0 increases. From Theorem 1 pr(ĉ13 = 0) tends to 0·30 if λ0 = 0·5 and to 0·45 when λ0 = 1.
Example with p = 3. Asymptotic distribution of our estimator as estimated by 1000 simulations, for different values of λ0, (a) λ0 = 0, (b) λ0 = 0·5, (c) λ0 = 1.
Similarly to Theorem 1, we can derive the asymptotic properties of the nonnegative garrote-type estimator.
Denote by Ĉ the minimizer of (6) with initial estimator. If
and
as
, then
ifcij = 0, and other elements of Ĉ have the same limiting distribution as the maximum likelihood estimator on the true graph structure.
Theorem 2 indicates that the garrote-type estimator enjoys the so-called oracle property: it selects the right graph with probability tending to one and at the same time gives a root-n consistent estimator of the concentration matrix.
4 Computation
4·1 The maxdet problem
The nonlinearity of the objective function and the positive-definiteness constraint make the optimization problem (3) nontrivial. We take advantage of the connection between (3) and the determinant-maximization problem, the maxdet problem (Vandenberghe et al., 1998), which can be solved very efficiently with the interior point algorithm.
4·2 Algorithm for lasso-type estimator
Step 1. Let
and
for all i ≠ j.
Step 2. Let Ĉnew solve (8) over the set of positive definite matrices.
Step 3. If Ĉnew = Ĉold, then stop and let Ĉ = Ĉnew. Otherwise, set Ĉold = Ĉnew and sij = −sij for all pairs (i, j) such that ĉij = 0 and go back to Step 2.
The above algorithm always converges and converges to the solution to (3).
4·3 Tuning
So far we have concentrated on the calculation of the minimizer of (3) for any fixed tuning parameter t. In practice, we need to choose a tuning parameter so as to minimize a score measuring the goodness-of-fit. A commonly used such score is the multifold crossvalidation score, but a computationally more efficient alternative is the BIC for model selection and estimation. To evaluate the BIC for the current setting, one must first obtain an estimate of the degrees of freedom, which is defined as the number of unknown parameters in the case of the maximum likelihood estimate.
5 Quadratic Approximation
Note that θii, i = 1, …, p, are not determined. For notational purposes, we write θii = 1 for i = 1, …, p. Recall that we scale each component of X so that all the diagonal elements of the sample concentration matrix are unity. The following lemma reveals a close connection between the approach of Meinshausen & Bühlmann (2006) and the second-order approximation (9).
Lemma 4 shows that the approach of Meinshausen & Bühlmann (2006) seeks a sparse C close to A−1. However, it does not incorporate the symmetry and positive-definiteness constraint in the estimation of the concentration matrix, and therefore an additional step is needed to estimate either the covariance matrix or the concentration matrix. Also, the loss function used by Meinshausen & Bühlmann is different from the quadratic approximation to the loglikelihood, and therefore the approach is expected to be less efficient than our penalized likelihood method or the corresponding quadratic approximation (9).
6 Simulation
We consider eight different models in our simulation.
Model 1. Heterogeneous model with Σ = diag(1, 2, …, n).
Model 2. An AR(1) model with cii = 1 and ci, i−1 = ci−1, i = 0·5.
Model 3. An AR(2) model with cii = 1, ci, i−1 = ci−1, i = 0·5 and ci, i−2 = ci−2, i = 0·25.
Model 4. An AR(3) model with cii = 1, ci, i−1 = ci−1, i = 0·4 and ci, i−2 = ci−2, i = ci, i−3 = ci−3, i = 0·2.
Model 5. An AR(4) model with cii = 1, ci, i−1 = ci−1, i = 0·4, ci, i−2 = ci−2, i = ci, i−3 = ci−3, i = 0·2 and ci, i−4 = ci−4, i = 0·1.
Model 6. Full model with cij = 2 if i = j and cij = 1 otherwise.
Model 7. Star model with every node connected to the first node, with cii = 1, c1, i = ci, 1 = 0·2 and cij = 0 otherwise.
Model 8. Circle model with cii = 1, ci, i−1 = ci−1, i = 0·5 and c1n = cn1 = 0·4.
Results for the eight simulated models. Averages and standard errors from 100 runs
. | . | . | Lasso . | . | . | Garrote . | . | . | MB . | . | . | SIN (0·05) . | . | . | SIN (0·25) . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
p . | Model . | KL . | FP . | FN . | KL . | FP . | FN . | KL . | FP . | FN . | KL . | FP . | FN . | KL . | FP . | FN . |
1 | 0·27 | 0·20 | 0·00 | 0·31 | 0·42 | 0·00 | 0·45 | 0·91 | 0·00 | 0·26 | 0·05 | 0·00 | 0·32 | 0·26 | 0·00 | |
(0·02) | (0·06) | (0·00) | (0·02) | (0·08) | (0·00) | (0·04) | (0·08) | (0·00) | (0·02) | (0·03) | (0·00) | (0·03) | (0·07) | (0·00) | ||
2 | 0·70 | 3·31 | 0·07 | 0·67 | 1·20 | 0·14 | 0·63 | 0·68 | 0·16 | 1·88 | 0·03 | 2·47 | 1·40 | 0·15 | 1·59 | |
(0·05) | (0·12) | (0·03) | (0·05) | (0·12) | (0·04) | (0·05) | (0·08) | (0·04) | (0·06) | (0·02) | (0·10) | (0·06) | (0·05) | (0·09) | ||
3 | 0·89 | 1·29 | 2·24 | 0·87 | 0·60 | 2·58 | 0·98 | 0·47 | 3·68 | 1·16 | 0·01 | 5·42 | 1·06 | 0·07 | 4·16 | |
(0·05) | (0·10) | (0·24) | (0·04) | (0·08) | (0·18) | (0·04) | (0·06) | (0·09) | (0·04) | (0·01) | (0·12) | (0·05) | (0·04) | (0·15) | ||
4 | 0·79 | 0·22 | 5·60 | 0·80 | 0·16 | 5·86 | 0·83 | 0·16 | 6·23 | 0·93 | 0·00 | 8·14 | 0·90 | 0·01 | 6·97 | |
(0·03) | (0·04) | (0·29) | (0·04) | (0·04) | (0·23) | (0·04) | (0·04) | (0·11) | (0·03) | (0·00) | (0·11) | (0·03) | (0·01) | (0·15) | ||
5 | 5 | 0·78 | 0·00 | 7·06 | 0·76 | 0·00 | 6·98 | 0·80 | 0·00 | 7·26 | 0·88 | 0·00 | 9·13 | 0·86 | 0·00 | 7·94 |
(0·04) | (0·00) | (0·30) | (0·03) | (0·00) | (0·23) | (0·04) | (0·00) | (0·12) | (0·03) | (0·00) | (0·11) | (0·03) | (0·00) | (0·16) | ||
6 | 1·09 | 0·00 | 4·53 | 1·11 | 0·00 | 4·58 | 1·30 | 0·00 | 7·05 | 1·28 | 0·00 | 6·18 | 1·18 | 0·00 | 3·77 | |
(0·04) | (0·00) | (0·44) | (0·04) | (0·00) | (0·41) | (0·04) | (0·00) | (0·11) | (0·04) | (0·00) | (0·24) | (0·05) | (0·00) | (0·25) | ||
7 | 0·45 | 0·31 | 3·47 | 0·51 | 0·46 | 3·02 | 0·61 | 0·55 | 2·75 | 0·43 | 0·00 | 3·92 | 0·50 | 0·13 | 3·61 | |
(0·02) | (0·08) | (0·10) | (0·03) | (0·08) | (0·12) | (0·03) | (0·07) | (0·10) | (0·02) | (0·00) | (0·03) | (0·03) | (0·05) | (0·06) | ||
8 | 0·73 | 2·55 | 0·11 | 0·77 | 1·28 | 0·26 | 0·80 | 0·17 | 0·37 | 1·89 | 0·03 | 3·29 | 1·48 | 0·11 | 2·12 | |
(0·05) | (0·13) | (0·03) | (0·05) | (0·12) | (0·06) | (0·05) | (0·05) | (0·06) | (0·05) | (0·02) | (0·10) | (0·05) | (0·04) | (0·09) | ||
1 | 0·22 | 0·26 | 0·00 | 0·26 | 0·75 | 0·00 | 0·63 | 3·48 | 0·00 | 0·23 | 0·07 | 0·00 | 0·26 | 0·23 | 0·00 | |
(0·01) | (0·09) | (0·00) | (0·01) | (0·14) | (0·00) | (0·03) | (0·17) | (0·00) | (0·01) | (0·03) | (0·00) | (0·02) | (0·05) | (0·00) | ||
2 | 1·42 | 31·76 | 0·00 | 0·60 | 4·83 | 0·00 | 0·58 | 2·25 | 0·00 | 4·01 | 0·06 | 3·75 | 2·39 | 0·25 | 2·05 | |
(0·04) | (0·26) | (0·00) | (0·02) | (0·27) | (0·00) | (0·02) | (0·15) | (0·00) | (0·14) | (0·02) | (0·14) | (0·12) | (0·06) | (0·12) | ||
3 | 1·22 | 10·87 | 3·30 | 1·03 | 5·86 | 3·05 | 1·52 | 2·92 | 7·07 | 1·90 | 0·07 | 11·26 | 1·60 | 0·21 | 8·91 | |
(0·04) | (0·59) | (0·34) | (0·03) | (0·34) | (0·21) | (0·03) | (0·15) | (0·14) | (0·03) | (0·03) | (0·20) | (0·03) | (0·05) | (0·20) | ||
4 | 1·22 | 3·37 | 14·10 | 1·07 | 2·14 | 12·64 | 1·28 | 1·95 | 14·33 | 1·68 | 0·05 | 20·80 | 1·49 | 0·15 | 18·34 | |
(0·04) | (0·32) | (0·52) | (0·03) | (0·19) | (0·39) | (0·03) | (0·14) | (0·20) | (0·02) | (0·02) | (0·20) | (0·03) | (0·04) | (0·27) | ||
10 | 5 | 1·21 | 1·08 | 23·34 | 1·06 | 0·98 | 20·58 | 1·23 | 1·02 | 21·19 | 1·42 | 0·02 | 26·66 | 1·30 | 0·08 | 24·13 |
(0·03) | (0·18) | (0·56) | (0·03) | (0·12) | (0·47) | (0·03) | (0·09) | (0·20) | (0·02) | (0·01) | (0·22) | (0·02) | (0·03) | (0·31) | ||
6 | 1·66 | 0·00 | 44·60 | 1·66 | 0·00 | 44·30 | 2·08 | 0·00 | 38·05 | 2·10 | 0·00 | 17·29 | 1·95 | 0·00 | 9·94 | |
(0·01) | (0·00) | (0·16) | (0·01) | (0·00) | (0·18) | (0·02) | (0·00) | (0·20) | (0·04) | (0·00) | (1·01) | (0·05) | (0·00) | (0·79) | ||
7 | 0·71 | 0·73 | 7·61 | 0·69 | 2·14 | 5·82 | 0·97 | 3·37 | 4·77 | 0·78 | 0·07 | 8·79 | 0·81 | 0·23 | 8·29 | |
(0·01) | (0·15) | (0·21) | (0·02) | (0·24) | (0·25) | (0·03) | (0·16) | (0·16) | (0·01) | (0·03) | (0·05) | (0·02) | (0·05) | (0·09) | ||
8 | 0·89 | 19·24 | 0·02 | 0·65 | 5·81 | 0·03 | 0·93 | 3·58 | 0·00 | 6·83 | 0·06 | 4·50 | 4·03 | 0·25 | 2·55 | |
(0·04) | (0·63) | (0·02) | (0·03) | (0·30) | (0·02) | (0·02) | (0·19) | (0·00) | (0·23) | (0·02) | (0·16) | (0·18) | (0·06) | (0·13) |
MB, method of Meinshausen & Bühlmann (2006); SIN (0·05), method of Drton & Perlman (2004) based on a cut-off of 0·05; SIN (0·25), method of Drton & Perlman (2004) based on a cut-off of 0·25; KL, Kullback-Leibler loss defined in (6); FP, number of incorrectly identified edges; FN, number of incorrectly missed edges.
Results for the eight simulated models. Averages and standard errors from 100 runs
. | . | . | Lasso . | . | . | Garrote . | . | . | MB . | . | . | SIN (0·05) . | . | . | SIN (0·25) . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
p . | Model . | KL . | FP . | FN . | KL . | FP . | FN . | KL . | FP . | FN . | KL . | FP . | FN . | KL . | FP . | FN . |
1 | 0·27 | 0·20 | 0·00 | 0·31 | 0·42 | 0·00 | 0·45 | 0·91 | 0·00 | 0·26 | 0·05 | 0·00 | 0·32 | 0·26 | 0·00 | |
(0·02) | (0·06) | (0·00) | (0·02) | (0·08) | (0·00) | (0·04) | (0·08) | (0·00) | (0·02) | (0·03) | (0·00) | (0·03) | (0·07) | (0·00) | ||
2 | 0·70 | 3·31 | 0·07 | 0·67 | 1·20 | 0·14 | 0·63 | 0·68 | 0·16 | 1·88 | 0·03 | 2·47 | 1·40 | 0·15 | 1·59 | |
(0·05) | (0·12) | (0·03) | (0·05) | (0·12) | (0·04) | (0·05) | (0·08) | (0·04) | (0·06) | (0·02) | (0·10) | (0·06) | (0·05) | (0·09) | ||
3 | 0·89 | 1·29 | 2·24 | 0·87 | 0·60 | 2·58 | 0·98 | 0·47 | 3·68 | 1·16 | 0·01 | 5·42 | 1·06 | 0·07 | 4·16 | |
(0·05) | (0·10) | (0·24) | (0·04) | (0·08) | (0·18) | (0·04) | (0·06) | (0·09) | (0·04) | (0·01) | (0·12) | (0·05) | (0·04) | (0·15) | ||
4 | 0·79 | 0·22 | 5·60 | 0·80 | 0·16 | 5·86 | 0·83 | 0·16 | 6·23 | 0·93 | 0·00 | 8·14 | 0·90 | 0·01 | 6·97 | |
(0·03) | (0·04) | (0·29) | (0·04) | (0·04) | (0·23) | (0·04) | (0·04) | (0·11) | (0·03) | (0·00) | (0·11) | (0·03) | (0·01) | (0·15) | ||
5 | 5 | 0·78 | 0·00 | 7·06 | 0·76 | 0·00 | 6·98 | 0·80 | 0·00 | 7·26 | 0·88 | 0·00 | 9·13 | 0·86 | 0·00 | 7·94 |
(0·04) | (0·00) | (0·30) | (0·03) | (0·00) | (0·23) | (0·04) | (0·00) | (0·12) | (0·03) | (0·00) | (0·11) | (0·03) | (0·00) | (0·16) | ||
6 | 1·09 | 0·00 | 4·53 | 1·11 | 0·00 | 4·58 | 1·30 | 0·00 | 7·05 | 1·28 | 0·00 | 6·18 | 1·18 | 0·00 | 3·77 | |
(0·04) | (0·00) | (0·44) | (0·04) | (0·00) | (0·41) | (0·04) | (0·00) | (0·11) | (0·04) | (0·00) | (0·24) | (0·05) | (0·00) | (0·25) | ||
7 | 0·45 | 0·31 | 3·47 | 0·51 | 0·46 | 3·02 | 0·61 | 0·55 | 2·75 | 0·43 | 0·00 | 3·92 | 0·50 | 0·13 | 3·61 | |
(0·02) | (0·08) | (0·10) | (0·03) | (0·08) | (0·12) | (0·03) | (0·07) | (0·10) | (0·02) | (0·00) | (0·03) | (0·03) | (0·05) | (0·06) | ||
8 | 0·73 | 2·55 | 0·11 | 0·77 | 1·28 | 0·26 | 0·80 | 0·17 | 0·37 | 1·89 | 0·03 | 3·29 | 1·48 | 0·11 | 2·12 | |
(0·05) | (0·13) | (0·03) | (0·05) | (0·12) | (0·06) | (0·05) | (0·05) | (0·06) | (0·05) | (0·02) | (0·10) | (0·05) | (0·04) | (0·09) | ||
1 | 0·22 | 0·26 | 0·00 | 0·26 | 0·75 | 0·00 | 0·63 | 3·48 | 0·00 | 0·23 | 0·07 | 0·00 | 0·26 | 0·23 | 0·00 | |
(0·01) | (0·09) | (0·00) | (0·01) | (0·14) | (0·00) | (0·03) | (0·17) | (0·00) | (0·01) | (0·03) | (0·00) | (0·02) | (0·05) | (0·00) | ||
2 | 1·42 | 31·76 | 0·00 | 0·60 | 4·83 | 0·00 | 0·58 | 2·25 | 0·00 | 4·01 | 0·06 | 3·75 | 2·39 | 0·25 | 2·05 | |
(0·04) | (0·26) | (0·00) | (0·02) | (0·27) | (0·00) | (0·02) | (0·15) | (0·00) | (0·14) | (0·02) | (0·14) | (0·12) | (0·06) | (0·12) | ||
3 | 1·22 | 10·87 | 3·30 | 1·03 | 5·86 | 3·05 | 1·52 | 2·92 | 7·07 | 1·90 | 0·07 | 11·26 | 1·60 | 0·21 | 8·91 | |
(0·04) | (0·59) | (0·34) | (0·03) | (0·34) | (0·21) | (0·03) | (0·15) | (0·14) | (0·03) | (0·03) | (0·20) | (0·03) | (0·05) | (0·20) | ||
4 | 1·22 | 3·37 | 14·10 | 1·07 | 2·14 | 12·64 | 1·28 | 1·95 | 14·33 | 1·68 | 0·05 | 20·80 | 1·49 | 0·15 | 18·34 | |
(0·04) | (0·32) | (0·52) | (0·03) | (0·19) | (0·39) | (0·03) | (0·14) | (0·20) | (0·02) | (0·02) | (0·20) | (0·03) | (0·04) | (0·27) | ||
10 | 5 | 1·21 | 1·08 | 23·34 | 1·06 | 0·98 | 20·58 | 1·23 | 1·02 | 21·19 | 1·42 | 0·02 | 26·66 | 1·30 | 0·08 | 24·13 |
(0·03) | (0·18) | (0·56) | (0·03) | (0·12) | (0·47) | (0·03) | (0·09) | (0·20) | (0·02) | (0·01) | (0·22) | (0·02) | (0·03) | (0·31) | ||
6 | 1·66 | 0·00 | 44·60 | 1·66 | 0·00 | 44·30 | 2·08 | 0·00 | 38·05 | 2·10 | 0·00 | 17·29 | 1·95 | 0·00 | 9·94 | |
(0·01) | (0·00) | (0·16) | (0·01) | (0·00) | (0·18) | (0·02) | (0·00) | (0·20) | (0·04) | (0·00) | (1·01) | (0·05) | (0·00) | (0·79) | ||
7 | 0·71 | 0·73 | 7·61 | 0·69 | 2·14 | 5·82 | 0·97 | 3·37 | 4·77 | 0·78 | 0·07 | 8·79 | 0·81 | 0·23 | 8·29 | |
(0·01) | (0·15) | (0·21) | (0·02) | (0·24) | (0·25) | (0·03) | (0·16) | (0·16) | (0·01) | (0·03) | (0·05) | (0·02) | (0·05) | (0·09) | ||
8 | 0·89 | 19·24 | 0·02 | 0·65 | 5·81 | 0·03 | 0·93 | 3·58 | 0·00 | 6·83 | 0·06 | 4·50 | 4·03 | 0·25 | 2·55 | |
(0·04) | (0·63) | (0·02) | (0·03) | (0·30) | (0·02) | (0·02) | (0·19) | (0·00) | (0·23) | (0·02) | (0·16) | (0·18) | (0·06) | (0·13) |
MB, method of Meinshausen & Bühlmann (2006); SIN (0·05), method of Drton & Perlman (2004) based on a cut-off of 0·05; SIN (0·25), method of Drton & Perlman (2004) based on a cut-off of 0·25; KL, Kullback-Leibler loss defined in (6); FP, number of incorrectly identified edges; FN, number of incorrectly missed edges.
As shown in Table 1, the proposed penalized likelihood methods enjoy better performance than the other methods. The method of Meinshausen & Bühlmann (2006) and both versions of the Drton–Perlman method tend to have larger FN, which may partly explain their relatively poor performance. However, the results suggest that the proposed penalized likelihood approach combined with BIC may have relatively larger FP. The solution path of (3) may therefore be more informative in determining the graph structure.
All four methods under comparison require the selection of tuning parameters that control the trade-off between sensitivity and specificity. To appreciate better the merits of different methods independently of the tuning parameter selection, we plotted the receiver operating characteristic curves for different models and methods, each averaged over the 100 simulated datasets. The AR(4) and full models are not included because the specificity in these cases is not well defined when p = 5. From the plot, not shown here because of space restrictions, the proposed methods outperform the other approaches for all models when p = 5 and n = 25. In the cases when p = 10 and n = 50, the performance of all methods improves but Lasso and Garrote still enjoy competitive performance when compared with the other approaches.
7 Real World Examples
We first consider three real-world examples. The cork borings data are presented in Whittaker (1990, p. 267) and were originally used by Rao (1948). The p = 4 measurements are the weights of cork borings on n = 28 trees in the four directions, north, east, south and west. Fret's heads dataset contains head measurements on the first and the second adult son in a sample of 25 families. The 4 variables are the head length of the first son, the head breadth of the first son, the head length of the second son and the head breadth of the second son. The data are also presented in Whittaker (1990, p. 255). The Mathematics marks dataset (Mardia et al., 1979, p. 3) contains the marks of n = 88 students in the p = 5 examinations in mechanics, vectors, algebra, analysis and statistics. The data also appear in Whittaker (1990, p. 1).
Figures 3–5 depict the solution paths of (3) for each of the three datasets.
Cork borings dataset. The Meinshausen–Bühlmann method selects (d); both methods based on SIN, with cut-off 0·05 and 0·25, select (b); both Lasso and Garrote with BIC select (e).
Fret's heads dataset. The Meinshausen–Bühlmann method selects (f); SIN with cut-off 0·05 selects (a); SIN with cut-off 0·25 selects (b); both Lasso and Garrote with BIC select (f).
Mathematics marks dataset. The Meinshausen–Bühlmann method selects (f); SIN with cut-off 0·05 selects (b) with an additional edge between Mechanics and Vectors; SIN with cut-off 0·25 selects (f); Lasso with BIC selects (j); and Garrote with BIC selects (f).
Small-scale examples. Averaged loss estimated by fivefold crossvalidation
Dataset . | Lasso . | Garrote . | MB . | SIN (0·05) . | SIN (0·25) . | Sample . |
---|---|---|---|---|---|---|
Cork borings | 21·65 | 22·28 | 22·46 | 25·21 | 24·45 | 22·68 |
Fret's heads | 18·68 | 18·33 | 20·15 | 21·10 | 21·22 | 20·00 |
Maths marks | 29·52 | 29·53 | 29·83 | 30·66 | 30·26 | 29·84 |
Small-scale examples. Averaged loss estimated by fivefold crossvalidation
Dataset . | Lasso . | Garrote . | MB . | SIN (0·05) . | SIN (0·25) . | Sample . |
---|---|---|---|---|---|---|
Cork borings | 21·65 | 22·28 | 22·46 | 25·21 | 24·45 | 22·68 |
Fret's heads | 18·68 | 18·33 | 20·15 | 21·10 | 21·22 | 20·00 |
Maths marks | 29·52 | 29·53 | 29·83 | 30·66 | 30·26 | 29·84 |
Next we considered a larger problem. The opening prices of 35 stocks were collected for the years 2003 and 2004. Different methods were applied to estimate the covariance matrix using the data from 2003. The KL loss of the estimates are then evaluated using the data from 2004, and Table 3 reports the improved KL loss over the sample covariance matrix.
Stock market example. Improvement of predictive loss over sample covariance matrix
Dataset . | Lasso . | Garrote . | MB . | SIN (0·05) . | SIN (0·25) . |
---|---|---|---|---|---|
Stock Market | 0·05 | 0·16 | −0·58 | −5·89 | −4·81 |
Stock market example. Improvement of predictive loss over sample covariance matrix
Dataset . | Lasso . | Garrote . | MB . | SIN (0·05) . | SIN (0·25) . |
---|---|---|---|---|---|
Stock Market | 0·05 | 0·16 | −0·58 | −5·89 | −4·81 |
As shown in Tables 2 and 3, the proposed penalized likelihood methods enjoy very competitive performance.
The authors wish to thank the editor and two anonymous referees for comments that greatly improved the paper. This research was supported in part by grants from the U.S. National Science Foundation.
Appendix
Proofs
Simple matrix calculus shows that the matrix of second derivatives of the objective function in (8) is positive definite and therefore the objective function is strictly convex. Since the feasible region is compact, Ĉnew is always well defined. We now show that the algorithm will terminate in a finite number of iterations. Note that, at each iteration, Ĉold lies in the feasible region of Step 2. If the algorithm does not terminate, that is, at each step Ĉnew ≠ Ĉold, then the minimum attained at Step 2 is strictly smaller than that from the previous iteration. The minima attained in the iterations form a strictly decreasing sequence, which in turn implies that the sign matrix in (8) must be different for all iterations. However, this contradicts the fact that there are only a finite number, 2p(p−1)/2, of possible choices for the sign matrix S. Therefore the algorithm has to terminate.