Sharp lower and upper bounds for the Gaussian rank of a graph
AMS subject classification
Keywords
1. Introduction
An open problem in graphical Gaussian models is to determine the smallest number of observations needed to guarantee the existence of the maximum likelihood estimator (MLE) of the covariance matrix. This problem first arose in Dempster’s paper [3] and has been frequently brought to attention by Steffen Lauritzen, as evident in [1] and Lauritzen’s lectures in “Durham Symposium on Mathematical Aspects of Graphical Models 2008”. Hence we refer to this problem as the D&L problem using the initials of Dempster and Lauritzen. Note that the D&L problem as stated above is not well-posed, because the existence of the MLE cannot be guaranteed for all observations, no matter how large the number of observations is. To make this well-posed, a necessary condition is to ignore a set of observations of probability zero and thus require the existence of the MLE only for observations in a set of probability one. It is now clear that the D&L problem in its most general form can be expressed as follows.
D&L problem (I) For a given graphical Gaussian model with respect to a graph , determine the smallest number of observations needed to guarantee the existence of the MLE with probability one.
A slightly different well-posed formulation of the D&L problem is to require the existence of the MLE (of the covariance matrix, henceforth) for all generic observations in the following sense. Let . Then the observations1 are said to be generic if each principal submatrix of the sample covariance matrix is non-singular. Note that with probability one every observations are generic (thus the set of non-generic observations is of probability zero). This formulation can now be expressed as follows.
D&L problem (II) For a given graphical Gaussian model with respect to a graph , determine the smallest number of observations needed to guarantee the existence of the MLE for all generic observations.
The number determined by the D&L problem (II) is said to be the Gaussian rank of , denoted by . The primary goal of this paper is to obtain sharp lower and upper bounds on . In parallel to the D&L problem (II), the number determined by the D&L problem (I) is said to be the weak Gaussian rank of , denoted by . Note again that being in general position is a generic property of any observations, that is, with probability one for every observations each principal submatrix of the sample covariance matrix is non-singular. Thus obviously , for every graph .
As we mentioned earlier, the D&L problem first arose in [3], a celebrated work of Dempster in which, under the principle of parsimony in fitting parametric models, Dempster introduced graphical Gaussian models and gave a partial analytic expression for the MLE of the covariance matrix, assuming that the MLE exists. In practice, the MLE has to be computed by an iterative numerical procedure, such as the iterative proportional scaling procedure (IPS) [16]. However, if the number of observations is not sufficiently large there is no guarantee that the output matrix will indeed correspond to the MLE, as it might not be positive definite. It is clear that the number determined by the D&L problem under either formulations depends on the level of sparsity in graph, but the exact nature of this dependency is far from clear. Mainly because sparse graphs are not well understood, in particular it is not known what graph parameters best measure the graph sparsity. Hence the D&L problems (I) and (II) are important problems not only in relation to maximum likelihood estimation but also for gaining deeper insight into sparse graphs. In practice, the D&L problem (II) is equivalent to the D&L problem (I), because in general the data can be expected to be generic and therefore is a condition that minimally guarantees the existence of the MLE.
In a graphical Gaussian model the sparsity is given by a pattern of zeros in the inverse covariance matrix. The pattern of zeros is determined by the missing edges of a graph and each zero entry of a generic inverse covariance matrix indicates that the corresponding pair of variables are conditionally independent given the remaining variables. An attractive feature of graphical Gaussian models, or graphical models in general, is that if the graph representing the model is sparse, then the MLE of the covariance matrix can exist even if the number of observations is much smaller than the dimension of the multivariate Gaussian distribution. Intuitively, we expect that the Gaussian rank of the graph, the number determined by the D&L problem, to decrease as the graph becomes sparser. However, it is not clear what best measures the sparsity of a graph or how the Gaussian rank varies accordingly. The main theorem of this paper suggests two such measures of sparsity.
Despite some efforts since 90’s not much progress has been made to resolve the D&L problem. The existing results are limited to a handful of publications as follows.
- (R1)
[5] For a decomposable graph , a graph that has no induced cycle of length larger than or equal to four, the Gaussian rank is equal to , the size of the largest complete subgraph of .
- (R2)
[5] For every graph , where denotes the treewidth of (See [7] for the definition).
- (R3)
[1] For , a cycle of length .
- (R4)
[17] For , the 3×3 grid, .
Theorem 1.1 Let be a graph. Then(1.1)
All the results stated in (R1) through (R4) now immediately follow from Theorem 1.1, since by some simple calculations we can show that
- (a)
for a decomposable graph ;
- (b)
for any (arbitrary) graph ;
- (c)
for a cycle (of any length) ;
- (d)
for a grid (with and ), .
This paper is structured as follows. In Section 2 we first introduce required preliminaries and notation in graph theory, matrix algebra and Gaussian graphical models, and then explain the MLE problem for Gaussian graphical models. In Section 3 we give an alternative description of the Gaussian rank and derive some of its properties. In Section 4 we prove Theorem 1.1. In Section 5 we apply Theorem 1.1 to some special graphs to exactly determine their Gaussian ranks. These graphs include symmetric graphs and random graphs. We also obtain a tight numerical upper bound for the Gaussian ranks of planar graphs.
2. Preliminaries
In this section, we establish some necessary notation, terminology and definitions in graphical models, matrix algebra and geometric graph theory. We also carefully explain how the maximum likelihood problem for graphical Gaussian models leads to the D&L problem. Our notation presented here closely follows the notation established in [15], [10]. Let be an undirected graph, where is the vertex set and is the edge set of . Each edge is an unordered pair , where . For ease of notation we assume that contains all the self-loops, that is, for every . Unless otherwise stated, we always assume that . For a vertex , the set and the number of vertices adjacent to are denoted by and . A graph is a subgraph of , denoted by , if and . For a set , the graph defined by is said to be the subgraph of induced by . For a vertex , the induced subgraph is denoted by . Note that is simply obtained by removing and its incident edges from . For an edge denotes the subgraph obtained by removing from , that is, . Let denote the set of all graphs. A graph parameter is a function . In words, a graph parameter is a function that assigns a number to each graph. In graph theory two common graph parameters are: , said to be the minimum degree of ; , said to be the vertex connectivity number of . In words, is the smallest number of vertices whose deletion separates the graph or makes it trivial. Example 2.1 Consider the graphs given by Fig. 1(a)–(c). The graph denoted by in Fig. 1(a) is a 4×5 grid. It is easy to see that . In fact for every grid (with and ), . The graph denoted by in Fig. 1(b) is a cycle of length 8 with . This is obviously true for any arbitrary cycle (with length ). The graph denoted by in Fig. 1(c) is a complete (4,3)-bipartite graph. In general, a complete -bipartite graph is a graph so that its vertex set can be partitioned into two sets with and elements such that two vertices are adjacent if and only if they are in different partitions of . One can check that [15]. Fig. 1. Denoted graphs are (a) a 4×5 grid, (b) a cycle of length 8, (c) a complete (4,3)-bipartite graph, . Remark 2.1 These simple facts can be found in [15]. If two graph parameters are such that for every subgraph , then . If a graph parameter is non-decreasing, that is, , then . For example, for this reason and . For every graph we have . In light of (a) and (b), this implies that . There are algorithms (specifically given in [15]) that compute and , respectively, in no more than and steps (in contrast, clique number and treewidth cannot be computed in polynomial time). The notation presented here closely follows the notation established in [19], [15]. For a vector and , let denote the subvector . For a matrix and and , let denote the submatrix . A principal submatrix is simply denoted by . Some other notations used throughout the paper are as follows. The set of symmetric matrices is denotes by . The set of (symmetric) positive semi-definite matrices is denoted by . The set of (symmetric) positive definite matrices is denoted by . The set of matrices of a fixed rank is denoted by . (or ) denotes is a positive semidefinite (or positive definite) matrix without specifying its dimension. The Moore–Penrose generalized inverse of is denoted by . Note that and . It is clear that when is non-singular. Let be partitioned as , where . Then the Schur complement of , denoted by , is defined by Let be vectors in . These vectors are said to be in general position in if any vectors among them are linearly independent. Also a square matrix of rank is said to be in general position if every principal submatrix of is non-singular. Next we proceed with two lemmas and a corollary due to Eaton and Perlman [4] and Malley [13]. For completeness and because the notation under consideration is slightly different in the Appendix we prove these results. Lemma 2.1 If vectors are randomly selected in (with a probability distribution dominated by Lebesgue measure), then with probability one they are in general position. Lemma 2.2 Let . Then is in general position if and only if there are general-position vectors such that(2.2) Corollary 2.1 Suppose are observations from a -variate distribution. Then with probability one the sample covariance matrix and the sample correlation matrix are in general positions. A useful property of a general-position matrix is that in some sense it can be extended to another general-position matrix in or . The next lemma formalizes this fact. Lemma 2.3 Let be in general position. There exist non-zero vectors and such that and are general-position matrices, respectively, in and . If , then there exists a vector such that is in general position. This in return implies that the matrix Proof (a) Let and choose a non-zero vector such that . We set where . Note that and . Thus we have and therefore , by Theorem 1.20 in [19]. Now we claim that has rank and is in general position. It suffices to show that for every set . Let . Then Thus, if we set , then . Our claim is established if we can show that We verify that the right-hand side inequality holds by writing Therefore, and . This shows that is in general position. To prove the second half of Part (a) we proceed similarly. First we choose a vector such that . Now we set Then we have . Consequently, is in general position. This in return implies that (b) First note that for a vector , if and only if , otherwise [9]. Now it suffices to find a vector such that for each . For this, we note that and the zeros of the polynomials are all algebraic sets and therefore have zero measures. Thus we can choose a vector in the nonempty set . For this vector the matrix is in general position. The rest of the proof is similar to that of Part (a). □ Remark 2.2 As the proof shows, Part (a) of Lemma 2.3 holds in particular for , where and satisfies . Let be a graph with vertices. A Gaussian distribution is said to be Markov with respect to if The graphical Gaussian model over , denoted by , is the family of Markov with respect to . Let Then is an -dimensional linear space and is an open convex cone in . Note that is in fact the set of inverse covariance matrices for the graphical Gaussian model . Suppose are observations taken from . Then the MLE of is the solution to the following optimization problem [8]: Recall that is the sample covariance matrix. In terms of the inverse covariance matrix , this optimization problem can be recast as the convex optimization problem: (2.3) A standard optimization procedure such as in [8] or [2] shows that the dual problem of (2.3) solves the following concave optimization problem. (2.4) which has a unique solution if and only if the feasible set determined by Eq. (2.4) is non-empty. Note that the existence of a matrix that satisfies the condition given by Equation (2.4) is essentially a positive definite completion problem. For the future reference we record the conclusion as follows. Proposition 2.1 Suppose are observations taken from . Then the MLE of exists if and only if there is a matrix such that for every .2.1. Graph theoretical notion, definitions
2.2. Graph parameters
Now for every graph parameter we can define a new graph parameter as Two such defined graph parameters are and , known as the graph degeneracy number and the subgraph connectivity number of , respectively [15]. Note that is the smallest number of vertex deletions sufficient to disconnect each subgraph of . Since we have (2.1)2.3. Matrix algebra notation, definitions
Convention: For convenience in subsequent sections, unless otherwise stated, we always assume that given matrices are non-zero.2.4. General-position vectors and matrices
[4], [13]
[4], [13]
2.5. Graphical Gaussian models
2.6. The maximum likelihood problem
3. The Gaussian rank of a graph
In this section we give another description for the Gaussian rank of a graph using its unique property with respect to the positive definite completion problem given by Eq. (2.4). An advantage of this alternative description is that more easily can be verified and thus used to derive the properties of the Gaussian rank as a graph parameter. Let be a graph. Two matrices are said to match on , denoted by , if for each . In other words, two matrices match on if their Euclidean projection onto are identical. The matching relation is additive, and invariant under scaling [1]. More precisely, (3.1)(3.2) Let us recall that by definition is the smallest number with the property that for every observations if the sample covariance matrix is of rank and in general position, then the MLE exists. In light of Proposition 2.1 an alternative description of can be given as follows. Proposition 3.1 Let be a graph. Then is the smallest number with the following property: , if is in general position such that . Remark 3.1 Similarly, is the smallest number with the following property: Almost surely , if is in general position such that . In this subsection we list some basic properties of the Gaussian rank as a graph parameter. Let us keep in mind that , for some positive integer if the condition in Proposition 3.1 is satisfied. Also, because of the scale invariant property of matching relation given by Eq. (3.2), whenever convenient we can assume that is a correlation matrix, that is, the diagonals of are all 1. Proposition 3.2 Let . Then for every general-position matrix there is a matrix such that . Proof Let be a decomposition of as in Eq. (A.1). Let us set and . Since is in general position the matrix is also in general position. Thus there is a matrix such that . Eq. (3.1) now shows as desired. □ Proposition 3.3 Let be a vertex of . Then the following holds. . if is adjacent to all other vertices. Proof First, without loss of generality we assume that and . Let us set . The claim is that . Since it suffices to consider the case . Let be in general position. By Part (a) in Lemma 2.3 there is a vector such that Let such that . If we set , then it is clear that . Now we show that . Let us set and let be in general position. We assume without loss of generality that is a correlation matrix. If we partition as By using the Guttman rank additivity formula in [19] and the fact that , for each , we conclude that is in general position. Therefore, there is a matrix such that . Let us set The additive property of the matching relation given by Eq. (3.1) now shows that and therefore . Suppose is adjacent to . As before, let us set . By Part (a) above . Thus, it suffices to show that . For this, let . Clearly, it suffices to consider the case . By Part (b) of Lemma 2.3 there is a vector such that the matrix Therefore, there is a matrix such that Thus, and consequently . □ Corollary 3.1 Let be a subgraph of . Then . Proof Every subgraph of is obtained by removing successively a finite number of vertices and edges from . Therefore, it suffices to show that , for each vertex , and , for each edge . The latter is obvious using the condition in Proposition 3.1. □ A graph is said to be the clique sum of two subgraphs and if and is a complete graph (including the empty graph). We write this as . The following proposition is now immediate using a standard completion process given by [5] or [10]. Proposition 3.4 Suppose . Then . In particular, if are all the connected components of , then Remark 3.2 In relation to the D&L problem (I) we can show (by slightly modifying our proofs and using the alternative description of given by Remark 3.1) that the properties of the Gaussian rank discussed in this subsection are also valid properties of the weak Gaussian rank. This fact however will not be used in the paper.3.1. An alternative description of the Gaussian rank
3.2. Some basic properties of the Gaussian rank
4. The proof of Theorem 1.1
In this section we prove Theorem 1.1, that is, the bounds given by Eq. (1.1). The lower and the upper bounds for are proved in separate subsections. First we state and prove the following key lemma. Lemma 4.1 Let be a vertex of . If , then . Proof If , then by Proposition 3.4. Thus we assume that . For convenience, let us assume that and . Note that . Now we set Note that since . Let . We need to show that there is a matrix such that . For this, we assume that is a correlation matrix and therefore can be partitioned as Note also that and is in general position. Let us set . Now we have This implies that . By Part (a) of Lemma 2.3 and Remark 2.2, if we set , then is in general position. Therefore, there is a matrix such that . Let us set By Eq. (3.1) it is clear that . Also, since and we have whenever . Therefore, . □ Proof By mathematical induction let us assume that for any graph with fewer than vertices . Now let be a graph with vertices. We assume without loss of generality that . Let such that . On the contrary, let us assume that , then by Part (a) of Proposition 3.3 we obtain Lemma 4.1 now implies that . The induction hypothesis then implies that . The fact that [15] implies that . □ Remark 4.1 Note that Lemma 4.1 also holds for the weak Gaussian rank, that is, for any graph if , then . Consequently, this implies . But the latter more easily follows from the fact that . To prove that this lower bound holds we mainly rely on Lovász–Saks–Schrijver’s Theorem in [11]. This theorem associates the connectivity of a graph with its certain geometric representations. More details are as follows. Let be graph. An orthonormal representation of in is a function assigning a unit vector to each vertex such that whenever . In words, assigns to each vertex a unit vector in such that the vectors assigned to nonadjacent vertices are orthogonal. Now a general-position orthonormal representation of in is an orthonormal representation in such that the assigned vectors are in general position in . The next theorem is crucial for proving the lower bound in Theorem 1.1. Theorem 4.1 If is a graph with vertices, then the following are equivalent: is -connected; has a general-position orthonormal representation in . Remark 4.2 We note that in the proof of in [11] it is shown that the set of general-position orthonormal representations in is a subvariety of and, consequently, of probability zero. Proof Since for every subgraph of , in light of Part (a) and Part (b) in Remark 2.1, it suffices to show that for every graph we have . First we set and . By Theorem 4.1 there are general-position unit vectors such that for each . Let . By Lemma 2.2 we have is in general position. Let be a basis of and set The proof of Lemma 6.3 in [18] shows that the matrix is in general position. Let such that . Then This shows that , since is a self dual cone, that is, and . Thus there is no matrix such that . This shows that . □ Remark 4.3 Note that the proof of lower bound above does not imply that . This is because by using Lovász–Saks–Schrijver’s Theorem, in light of Remark 4.2, the set of all general position matrices in our proof is of probability zero, and this is not sufficient to conclude .4.1. The upper bound:
4.2. The lower bound:
Lovász–Saks–Schrijver [11], [12]
5. Some applications of Theorem 1.1
A useful application of the bounds given by Theorem 1.1 is that when the lower and upper bounds are equal the Gaussian rank is exactly determined. In this section we briefly discuss for which graphs . We also use Theorem 1.1 to obtain a sharp numerical upper bound for the Gaussian ranks of the so-called planar graphs. Let be a graph. A permutation is said to be an automorphism of if . The graph is said to be symmetric if for any two edges and there is an automorphism of such that and (please see Chapter 27 in [7] for a detailed discussion of symmetric graphs). For example, one can check that the graphs given by Fig. 2(a) and (b) are symmetric. The symmetric property in particular implies that is regular, that is, there is a positive integer such that for every vertex . An interesting feature of a symmetric graph is that . This consequently implies that . Theorem 1.1 therefore implies that for a symmetric graph the Gaussian rank is exactly determined as (5.1) Note that Eq. (5.1) can hold for many non-symmetric graphs as well, such as the regular graph given by Fig. 2(c) or even non-regular graphs such as grids. To the best of our knowledge the class of all graphs satisfying , or even , is not fully characterized. Fig. 2. Denoted graphs in (a) and (b) are symmetric. The graph given by (c) is not symmetric. In each graph the equality in Eq. (5.1) holds and the Gaussian ranks are easily computed to be 4, 4 and 5, respectively. A random graph in the Erdös–Rényi model is a graph in which the edges are selected by a sequence of (independent) Bernoulli trials with probability [7]. Theorem 5.1 Bollobás et al. Almost surely for every random graph we have . Corollary 5.1 Almost surely for every random graph we haveThis in particular implies that almost surely for every random graph we have A planar graph is a graph that can be drawn in a plane without the edges crossing each other. In graph theory it is well-known that planar graphs are at most 5-degenerate, that is, if is planar. Theorem 1.1 therefore implies that when is a planar graph . Note that the upper bound 6 is tight. For this, consider the planar graph in Fig. 3. Since by Eq. (5.1) we have . Fig. 3. A planar graph with degeneracy number 5. Remark 5.1 We note that Gross and Sullivant in [6] prove that for every planar graph . Comparing this with the tight upper bound shows that for some graphs the week Gaussian rank can be strictly less than the Gaussian rank. For example for the graph given by Fig. 3 we have .5.1. Gaussian ranks of symmetric graphs
5.2. Gaussian ranks of random graphs
5.3. On the Gaussian ranks of planar graphs
Acknowledgments
Special thanks to Steen Andersson, Gérard Letac, Hélène Massam and Bala Rajaratnam for helpful discussions and feedback. I am also grateful to the referees for their detailed, helpful comments.
Appendix.
Proof of Lemma 2.1 First we note that randomly selecting vectors in is equivalent to randomly selecting a matrix . Let denote a matrix of variables. For each we define the polynomial Each is an algebraic set and its (Lebesgue) measure is therefore zero. Thus, . This completes the proof, since each columns of a matrix are linearly independent. □
Proof of Lemma 2.2 Suppose . Let each pair of denote a positive eigenvalue and its corresponding eigenvector of . Let us set , for . Then we have (A.1) Note that are the columns of . Suppose for the vectors are linearly dependent. Thus, there is a non-zero vector such that Thus and consequently is singular. Suppose are in general position. Let be defined as in Eq. (2.2). Then for each the submatrix □
Proof of Corollary 2.1 Let be the data matrix. Then by Lemma 2.1 with probability one , the columns of , are in general position in . The result now follows from Lemma 2.2, since whenever the columns of are in general position is also in general position. □
References
- [1]On the existence of maximum likelihood estimators for graphical Gaussian modelsScand. J. Stat., 20 (3) (1993), pp. 263-270
- [2]Covariance selection for nonchordal graphs via chordal embeddingOptim. Methods Softw., 23 (4) (2008), pp. 501-520
- [3]Covariance selectionBiometrics (1972), pp. 157-175
- [4]The non-singularity of generalized sample covariance matricesAnn. Statist., 1 (1973), pp. 710-717
- [5]Positive definite completions of partial Hermitian matricesLinear Algebra Appl., 58 (1984), pp. 109-124
- [6]Elizabeth Gross, Seth Sullivant, The maximum likelihood threshold of a graph. 2014. ArXiv:1404.6989.
- [7]Jonathan L. Gross, Jay Yellen (Eds.), Handbook of Graph Theory, Discrete Mathematics and its Applications (Boca Raton), CRC Press, Boca Raton, FL (2004)
- [8]The Elements of Statistical Learning (second ed.), Springer Series in Statistics, Springer, New York (2009)Data mining, inference, and prediction
- [9]The Theory of Matrices in Numerical AnalysisDover Publications, Inc., New York (1975)Reprint of 1964 edition
- [10]Graphical Models, Oxford Statistical Science Series, vol. 17, The Clarendon Press, Oxford University Press, New York (1996)Oxford Science Publications
- [11]Orthogonal representations and connectivity of graphsLinear Algebra Appl., 114/115 (1989), pp. 439-454
- [12]A correction: orthogonal representations and connectivity of graphs[Linear Algebra Appl. 114/115 (1989), 439–454; MR0986889 (90k:05095)]Linear Algebra Appl., 313 (1-3) (2000), pp. 101-105
- [13]Statistical and algebraic independenceAnn. Statist., 11 (1) (1983), pp. 341-345
- [14]On the treewidth of dynamic graphsTheoret. Comput. Sci., 554 (2014), pp. 217-228
- [15]Subgraph connectivity numbers of a graphTheory and Applications of Graphs (Proc. Internat. Conf., Western Mich. Univ., Kalamazoo, Mich., 1976), Lecture Notes in Math., vol. 642, Springer, Berlin (1978), pp. 371-383
- [16]Gaussian Markov distributions over finite graphsAnn. Statist., 14 (1) (1986), pp. 138-150
- [17]Geometry of maximum likelihood estimation in Gaussian graphical modelsAnn. Statist., 40 (1) (2012), pp. 238-261
- [18]Graphs with magnetic Schrödinger operators of low corankJ. Combin. Theory Ser. B, 84 (2) (2002), pp. 311-339
- [19]The Schur Complement and its Applications, Numerical Methods and Algorithms, vol. 4, Springer-Verlag, New York (2005)
Cited by (6)
Maximum Likelihood Threshold and Generic Completion Rank of Graphs
2019, Discrete and Computational GeometryThe maximum likelihood threshold of a graph
2018, Bernoulli
- 1
Throughout the paper, we always assume that the observations are independent and identically distributed.