Sharp lower and upper bounds for the Gaussian rank of a graph

https://doi.org/10.1016/j.jmva.2015.03.004Get rights and content
Under an Elsevier user license
open archive

Abstract

An open problem in graphical Gaussian models is to determine the smallest number of observations needed to guarantee the existence of the maximum likelihood estimator of the covariance matrix with probability one. In this paper we formulate a closely related problem in which the existence of the maximum likelihood estimator is guaranteed for all generic observations. We call the number determined by this problem the Gaussian rank of the graph representing the model. We prove that the Gaussian rank is strictly between the subgraph connectivity number and the graph degeneracy number. These bounds are sharper than the bounds known in the literature and furthermore computable in polynomial time.

AMS subject classification

62H12
97K30

Keywords

Gaussian graphical models
Maximum likelihood estimation
Tree-width
Connectivity number
Degeneracy number
Random graphs
Symmetric graphs

1. Introduction

An open problem in graphical Gaussian models is to determine the smallest number of observations needed to guarantee the existence of the maximum likelihood estimator (MLE) of the covariance matrix. This problem first arose in Dempster’s paper  [3] and has been frequently brought to attention by Steffen Lauritzen, as evident in  [1] and Lauritzen’s lectures in “Durham Symposium on Mathematical Aspects of Graphical Models 2008”. Hence we refer to this problem as the D&L problem using the initials of Dempster and Lauritzen. Note that the D&L problem as stated above is not well-posed, because the existence of the MLE cannot be guaranteed for all observations, no matter how large the number of observations is. To make this well-posed, a necessary condition is to ignore a set of observations of probability zero and thus require the existence of the MLE only for observations in a set of probability one. It is now clear that the D&L problem in its most general form can be expressed as follows.

D&L problem (I) For a given graphical Gaussian model with respect to a graph G, determine the smallest number of observations needed to guarantee the existence of the MLE with probability one.

A slightly different well-posed formulation of the D&L problem is to require the existence of the MLE (of the covariance matrix, henceforth) for all generic observations in the following sense. Let np. Then the observations1x1,,xnRp are said to be generic if each n×n principal submatrix of the sample covariance matrix S=1/ni=1nxixi is non-singular. Note that with probability one every n observations are generic (thus the set of non-generic observations is of probability zero). This formulation can now be expressed as follows.

D&L problem (II) For a given graphical Gaussian model with respect to a graph G, determine the smallest number of observations needed to guarantee the existence of the MLE for all generic observations.

The number determined by the D&L problem (II) is said to be the Gaussian rank of G, denoted by r(G). The primary goal of this paper is to obtain sharp lower and upper bounds on r(G). In parallel to the D&L problem (II), the number determined by the D&L problem (I) is said to be the weak Gaussian rank of G, denoted by rw(G). Note again that being in general position is a generic property of any n observations, that is, with probability one for every n observations each n×n principal submatrix of the sample covariance matrix is non-singular. Thus obviously r(G)rw(G), for every graph G.

As we mentioned earlier, the D&L problem first arose in  [3], a celebrated work of Dempster in which, under the principle of parsimony in fitting parametric models, Dempster introduced graphical Gaussian models and gave a partial analytic expression for the MLE of the covariance matrix, assuming that the MLE exists. In practice, the MLE has to be computed by an iterative numerical procedure, such as the iterative proportional scaling procedure (IPS)  [16]. However, if the number of observations is not sufficiently large there is no guarantee that the output matrix will indeed correspond to the MLE, as it might not be positive definite. It is clear that the number determined by the D&L problem under either formulations depends on the level of sparsity in graph, but the exact nature of this dependency is far from clear. Mainly because sparse graphs are not well understood, in particular it is not known what graph parameters best measure the graph sparsity. Hence the D&L problems (I) and (II) are important problems not only in relation to maximum likelihood estimation but also for gaining deeper insight into sparse graphs. In practice, the D&L problem (II) is equivalent to the D&L problem (I), because in general the data can be expected to be generic and therefore nr(G) is a condition that minimally guarantees the existence of the MLE.

In a graphical Gaussian model the sparsity is given by a pattern of zeros in the inverse covariance matrix. The pattern of zeros is determined by the missing edges of a graph and each zero entry of a generic inverse covariance matrix indicates that the corresponding pair of variables are conditionally independent given the remaining variables. An attractive feature of graphical Gaussian models, or graphical models in general, is that if the graph representing the model is sparse, then the MLE of the covariance matrix can exist even if the number of observations is much smaller than the dimension of the multivariate Gaussian distribution. Intuitively, we expect that the Gaussian rank of the graph, the number determined by the D&L problem, to decrease as the graph becomes sparser. However, it is not clear what best measures the sparsity of a graph or how the Gaussian rank varies accordingly. The main theorem of this paper suggests two such measures of sparsity.

Despite some efforts since 90’s not much progress has been made to resolve the D&L problem. The existing results are limited to a handful of publications as follows.

  • (R1)

    [5] For a decomposable graph G, a graph that has no induced cycle of length larger than or equal to four, the Gaussian rank is equal to ω(G), the size of the largest complete subgraph of G.

  • (R2)

    [5] For every graph G,ω(G)rw(G)r(G)tw(G)+1, where tw(G) denotes the treewidth of G (See  [7] for the definition).

  • (R3)

    [1] For Cp, a cycle of length p3,2=ω(Cp)<rw(Cp)=r(Cp)=3=tw(Cp)+1.

  • (R4)

    [17] For G3,3, the 3×3 grid, 2=ω(G3,3)<rw(G3,3)=r(G3,3)=3<tw(G3,3)+1=4.

In the literature the bounds given by (R2) are currently the best known bounds for the Gaussian rank. Restricted to the class of decomposable graphs these bounds are tight, since tw(G)+1=ω(G) for a decomposable graph G, but we may note that (R4) in  [17] shows that for non-decomposable graphs the bounds in (R2) are not necessarily tight. Intuitively, it is apparent that ω(G) overestimates and tw(G) underestimates the sparsity of G and therefore sharper bounds may exist. In this paper we give sharper bounds on the Gaussian rank. The lower and upper bounds we give are the subgraph connectivity number, denoted by κ(G), and the graph degeneracy number, denoted by δ(G). Both κ(G) and δ(G) will be defined later in Section  2.2. Formally we prove the following theorem.

Theorem 1.1

Let G=(V,E) be a graph. Then(1.1)κ(G)+1r(G)δ(G)+1

The proof of this theorem is given in Section  4. The upper bound is proved by mathematical induction using two key observations: suppose a graph H is obtained from G by removing a vertex v and its adjacent edges. Then (1) r(H)r(G)1 and (2) if r(G) is larger than the number of the vertices adjacent to v, then r(H)=r(G). The proof of the lower bound relies on Lovász–Saks–Schrijver’s Theorem in  [11].

All the results stated in (R1) through (R4) now immediately follow from Theorem 1.1, since by some simple calculations we can show that

  • (a)

    for a decomposable graph G,ω(G)=κ(G)+1=δ(G)+1=tw(G)+1;

  • (b)

    for any (arbitrary) graph G,ω(G)κ(G)+1r(G)δ(G)+1tw(G)+1;

  • (c)

    for a cycle (of any length) G,κ(G)=δ(G)=2;

  • (d)

    for a k×m grid (with k and m2), κ(G)=δ(G)=2.

Note that by Part (d) the Gaussian rank of every grid is 3 which is substantially less than the upper bound given by (R2) for grids of large dimensions. The reason is that the treewidth of a k×m grid is min{k,m} which tends to + as k and m+   [14]. Here we mention that by relating rw(G) to the rigidity of the graph G, Gross and Sullivant in a recent manuscript  [6] also obtain the upper bound in Theorem 1.1 for rw(G).

This paper is structured as follows. In Section  2 we first introduce required preliminaries and notation in graph theory, matrix algebra and Gaussian graphical models, and then explain the MLE problem for Gaussian graphical models. In Section  3 we give an alternative description of the Gaussian rank and derive some of its properties. In Section  4 we prove Theorem 1.1. In Section  5 we apply Theorem 1.1 to some special graphs to exactly determine their Gaussian ranks. These graphs include symmetric graphs and random graphs. We also obtain a tight numerical upper bound for the Gaussian ranks of planar graphs.

2. Preliminaries

In this section, we establish some necessary notation, terminology and definitions in graphical models, matrix algebra and geometric graph theory. We also carefully explain how the maximum likelihood problem for graphical Gaussian models leads to the D&L problem.

2.1. Graph theoretical notion, definitions

Our notation presented here closely follows the notation established in  [15], [10]. Let G=(V,E) be an undirected graph, where V=V(G) is the vertex set and E=E(G) is the edge set of G. Each edge eE is an unordered pair ij={i,j}, where i,jV. For ease of notation we assume that E contains all the self-loops, that is, (i,i) for every iV. Unless otherwise stated, we always assume that V(G)={1,,p}=[p].

  • (1)

    For a vertex vV, the set and the number of vertices adjacent to v are denoted by ne(v) and degG(v).

  • (2)

    A graph H is a subgraph of G, denoted by HG, if V(H)V(G) and E(H)E(G).

  • (3)

    For a set VV, the graph defined by G[V]=(V,EV×V) is said to be the subgraph of G induced by V.

  • (4)

    For a vertex vV, the induced subgraph G[V{v}] is denoted by Gv. Note that Gv is simply obtained by removing v and its incident edges from G.

  • (5)

    For an edge eE,Ge denotes the subgraph obtained by removing e from G, that is, Ge=(V,E{e}).

2.2. Graph parameters

Let G denote the set of all graphs. A graph parameter is a function (μ:GR):Gμ(G). In words, a graph parameter is a function that assigns a number to each graph. In graph theory two common graph parameters are:

  • (a)

    δ(G)=min{degG(v):vV}, said to be the minimum degree of G;

  • (b)

    κ(G)=min{|S|:SVsuch thatG[VS]is disconnected}, said to be the vertex connectivity number of G. In words, κ(G) is the smallest number k of vertices whose deletion separates the graph or makes it trivial.

Now for every graph parameter μ we can define a new graph parameter as μ(G)=max{μ(H):HG}. Two such defined graph parameters are δ(G) and κ(G), known as the graph degeneracy number and the subgraph connectivity number of G, respectively  [15]. Note that κ(G) is the smallest number of vertex deletions sufficient to disconnect each subgraph of G. Since κ(G)δ(G) we have (2.1)κ(G)δ(G)(see Part (a) in  Remark 2.1).

Example 2.1

Consider the graphs given by Fig. 1(a)–(c).

  • (a)

    The graph denoted by G4,5 in Fig. 1(a) is a 4×5 grid. It is easy to see that κ(G4,5)=δ(G4,5)=2. In fact for every k×m grid (with k and m2), κ(Gk,m)=δ(Gk,m)=2.

  • (b)

    The graph denoted by C8 in Fig. 1(b) is a cycle of length 8 with κ(C8)=δ(C8)=2. This is obviously true for any arbitrary cycle Cp (with length p3).

  • (c)

    The graph denoted by K4,3 in Fig. 1(c) is a complete (4,3)-bipartite graph. In general, a complete (k,m)-bipartite graph is a graph so that its vertex set can be partitioned into two sets with k and m elements such that two vertices are adjacent if and only if they are in different partitions of V. One can check that κ(Kk,m)=δ(Kk,m)=min{k,m}   [15].

  1. Download : Download full-size image

Fig. 1. Denoted graphs are (a) a 4×5 grid, G4,5 (b) a cycle of length 8, C8 (c) a complete (4,3)-bipartite graph, K4,3.

Remark 2.1

These simple facts can be found in  [15].

  • (a)

    If two graph parameters are such that μ1(H)μ2(H) for every subgraph HG, then μ1(G)μ2(G).

  • (b)

    If a graph parameter is non-decreasing, that is, HGμ(H)μ(G), then μ(G)=μ(G). For example, for this reason ω(G)=ω(G) and tw(G)=tw(G).

  • (c)

    For every graph G we have ω(G)κ(G)+1δ(G)+1tw(G)+1. In light of (a) and (b), this implies that ω(G)κ(G)+1δ(G)+1tw(G)+1.

  • (d)

    There are algorithms (specifically given in  [15]) that compute κ(G) and δ(G), respectively, in no more than O(|V|3/2|E|2) and O(|E|) steps (in contrast, clique number and treewidth cannot be computed in polynomial time).

2.3. Matrix algebra notation, definitions

The notation presented here closely follows the notation established in  [19], [15]. For a vector u=(ui:iV)RV and αV, let u[α] denote the subvector (ui:iα)Rα. For a matrix A=(Ai,j)RV×V and α and βV, let A[α,β] denote the submatrix (ai,j:iα,jβ)Rα×β. A principal submatrix A[α,α] is simply denoted by A[α]. Some other notations used throughout the paper are as follows.

  • (1)

    The set of p×p symmetric matrices is denotes by S(p).

  • (2)

    The set of (symmetric) positive semi-definite matrices is denoted by S+(p).

  • (3)

    The set of (symmetric) positive definite matrices is denoted by S++(p).

  • (4)

    The set of matrices AS+(p) of a fixed rank d is denoted by S+(p,d).

  • (5)

    A0 (or A0) denotes A is a positive semidefinite (or positive definite) matrix without specifying its dimension.

  • (6)

    The Moore–Penrose generalized inverse of A is denoted by A. Note that AAA=A and AAA=A. It is clear that A=A1 when A is non-singular.

  • (7)

    Let A be partitioned as A=(A[α]A[α,β]A[β,α]A[β]), where β=Vα. Then the Schur complement of A[α], denoted by A[β|α], is defined by A[β|α]=A[β]A[β,α]A[α]A[α,β].

Convention: For convenience in subsequent sections, unless otherwise stated, we always assume that given matrices are non-zero.

2.4. General-position vectors and matrices

Let v1,,vp be p vectors in Rd. These vectors are said to be in general position in Rd if any d vectors among them are linearly independent. Also a square matrix A of rank d is said to be in general position if every d×d principal submatrix of A is non-singular.

Next we proceed with two lemmas and a corollary due to Eaton and Perlman  [4] and Malley  [13]. For completeness and because the notation under consideration is slightly different in the Appendix we prove these results.

Lemma 2.1

[4], [13]

If pd vectors are randomly selected in Rd (with a probability distribution dominated by Lebesgue measure), then with probability one they are in general position.

Lemma 2.1 therefore implies that being in general position is a generic property of any pd vectors in Rd. The next lemma shows that there is a relationship between general-position vectors and matrices.

Lemma 2.2

Let AS+(p,d) . Then A is in general position if and only if there are general-position vectors v1,,vpRd such that(2.2)A=(v1vp)(v1vp)=(vivj)1i,jp.

Corollary 2.1

[4], [13]

Suppose x1,,xn are n observations from a p-variate distribution. Then with probability one the sample covariance matrix S and the sample correlation matrix R are in general positions.

A useful property of a general-position matrix AS+(p,d) is that in some sense it can be extended to another general-position matrix in S+(p+1,r) or S+(p+1,r+1). The next lemma formalizes this fact.

Lemma 2.3

Let AS+(p,d) be in general position.

  • (a)

    There exist non-zero vectors w and vRp such that Aww and (1vTvA) are general-position matrices, respectively, in S+(p,d) and S+(p+1,d).

  • (b)

    If dp1, then there exists a vector uRp such that A+uuS+(p,d+1) is in general position. This in return implies that the matrix(1uuA+uu)S+(p+1,d+1)is in general position.

Proof

(a) Let τ=[d] and choose a non-zero vector xτRd such that xτA[τ]xτ<1. We set x=(xτ0)Rpandw=Ax=(A[τ]xτA[γ,τ]xτ), where γ=Vτ. Note that A0 and xAx=xτA[τ]xτ<1. Thus we have 1wAw=1xAx>0, and therefore B=AwwS+(p), by Theorem 1.20 in  [19]. Now we claim that B has rank d and is in general position. It suffices to show that B[α]=A[α]w[α]w[α]0 for every set α={i1,,id}V. Let β=Vα. Then w[α]=(Ax)[α]=(A[α]A[α,β])(x[α]x[β])=A[α]x[α]+A[α,β]x[β]. Thus, if we set yα=x[α]+A[α]1A[α,β]x[β], then A[α]yα=w[α]. Our claim is established if we can show that (1w[α]wA[α])0,or equivalently 1w[α]A[α]1w[α]=1yαA[α]yα>0. We verify that the right-hand side inequality holds by writing yαA[α]yα=(x[α]+A[α]1A[α,β]x[β])A[α](x[α]+A[α]1A[α,β]x[β])=x[α]A[α]x[α]+2x[α]A[α,β]x[β]+x[β]A[β,α]A[α]1A[α,β]x[β]=x[α]A[α]x[α]+2x[α]A[α,β]x[β]+x[α]A[α,β]x[β]x[β]A[β|α]x[β]=(x[α]x[β])(A[α]A[α,β]A[β,α]A[β])(x[α]x[β])x[β]A[β|α]x[β]=xAxx[β]A[β|α]x[β]xAx(since A0 and therefore A[β|α]0)<1. Therefore, B[α]0 and rank(B[α])=|α|=d. This shows that B=AwwS+(p,d) is in general position. To prove the second half of Part (a) we proceed similarly. First we choose a vector yτRd such that yτA[τ]yτ=1. Now we set y=(yτ0)Rpandv=Ay=(A[τ]yτA[γ,τ]yτ). Then we have 1vAv=1yAy=0. Consequently, B=AvvS+(p,d1) is in general position. This in return implies that (1vvA)S+(p+1,d)is in general position. (b) First note that for a vector uRp, rank(A+uu)=rank(A) if and only if uRange(A), otherwise rank(A+uu)=rank(A)+1   [9]. Now it suffices to find a vector uRpRange(A) such that pα(u)=det(A[α]+u[α]u[α])0 for each α={i1,,id+1}V. For this, we note that Range(A) and the zeros of the polynomials pα(x) are all algebraic sets Rp and therefore have zero measures. Thus we can choose a vector u in the nonempty set Rp(Range(A)α{x:pα(x)=0}). For this vector the matrix A+uuS+(p,d+1) is in general position. The rest of the proof is similar to that of Part (a). □

Remark 2.2

As the proof shows, Part (a) of Lemma 2.3 holds in particular for w=Ax, where x=(xτ0) and xτRd satisfies xτA[τ]xτ<1.

2.5. Graphical Gaussian models

Let G=(V,E) be a graph with p vertices. A Gaussian distribution Np(0,Σ) is said to be Markov with respect to G if (Σ1)i,j=0whenever ijE. The graphical Gaussian model over G, denoted by N(G), is the family of Np(0,Σ) Markov with respect to G. Let ZG={AS(p):Ai,j=0for eachijE}andPG=ZGS++(p). Then ZG is an |E|-dimensional linear space and PG is an open convex cone in ZG. Note that PG is in fact the set of inverse covariance matrices for the graphical Gaussian model N(G).

2.6. The maximum likelihood problem

Suppose x1,,xn are n observations taken from Np(0,Σ)N(G). Then the MLE of Σ is the solution to the following optimization problem  [8]: argmaxΩ(n2logdetΣn2tr(Σ1S))subject to (Σ1)i,j=0,ijEΣ0. Recall that S=1/ni=1nxixi is the sample covariance matrix. In terms of the inverse covariance matrix Ω=Σ1, this optimization problem can be recast as the convex optimization problem: (2.3)argminΩ(n2logdetΩ+n2tr(ΩS))subject to ΩPG. A standard optimization procedure such as in  [8] or  [2] shows that the dual problem of (2.3) solves the following concave optimization problem. argmaxΩ(logdetΣ)(2.4)subject to {Σi,j=Si,j,ijEΣ0, which has a unique solution if and only if the feasible set determined by Eq. (2.4) is non-empty. Note that the existence of a matrix Σ that satisfies the condition given by Equation (2.4) is essentially a positive definite completion problem. For the future reference we record the conclusion as follows.

Proposition 2.1

Suppose x1,,xn are n observations taken from Np(0,Σ)N(G) . Then the MLE of Σ exists if and only if there is a matrix PS++(p) such that Pi,j=Si,j for every ijE.

3. The Gaussian rank of a graph

In this section we give another description for the Gaussian rank of a graph using its unique property with respect to the positive definite completion problem given by Eq. (2.4). An advantage of this alternative description is that more easily can be verified and thus used to derive the properties of the Gaussian rank as a graph parameter.

3.1. An alternative description of the Gaussian rank

Let G=(V,E) be a graph. Two matrices A,BRp×p are said to match on G, denoted by A=GB, if Ai,j=Bi,j for each ijE. In other words, two matrices match on G if their Euclidean projection onto ZG are identical. The matching relation =G is additive, and invariant under scaling  [1]. More precisely, (3.1)A=GBandU=GVA+U=GB+V(3.2)A=GBDAD=GDBD,for every diagonal matrix D.

Let us recall that by definition r(G) is the smallest number r with the property that for every observations x1,,xr if the sample covariance matrix S is of rank r and in general position, then the MLE exists. In light of Proposition 2.1 an alternative description of r(G) can be given as follows.

Proposition 3.1

Let G be a graph. Then r(G) is the smallest number r with the following property:

  • ()

    AS+(p,r), if A is in general position PS++(p) such that P=GA.

Remark 3.1

Similarly, rw(G) is the smallest number r with the following property:

Almost surely AS+(p,r), if A is in general position PS++(p) such that P=GA.

3.2. Some basic properties of the Gaussian rank

In this subsection we list some basic properties of the Gaussian rank as a graph parameter. Let us keep in mind that r(G)r, for some positive integer r if the condition () in Proposition 3.1 is satisfied. Also, because of the scale invariant property of matching relation given by Eq. (3.2), whenever convenient we can assume that AS+(p,r) is a correlation matrix, that is, the diagonals of A are all 1.

Proposition 3.2

Let r>r(G) . Then for every general-position matrix AS+(p,r)there is a matrix PS++(p) such that P=GA.

Proof

Let A=i=1rwiwi be a decomposition of A as in Eq. (A.1). Let us set B=i=1r(G)wiwi and C=i=r(G)+1rwiwi. Since A is in general position the matrix BS+(p,r(G)) is also in general position. Thus there is a matrix QS++(p) such that Q=GB. Eq. (3.1) now shows A=B+C=GQ+CS++(p) as desired. □

In relation to the D&L problem (II) Proposition 3.2 is an important property that intuitively seems obvious. The reason is that if the MLE exists for every generic r(G) observations, then we expect this to be true for every generic rr(G) observations as well.

Proposition 3.3

Let vV be a vertex of G . Then the following holds.

  • (a)

    r(Gv)r(G)r(Gv)+1.

  • (b)

    r(G)=r(Gv)+1 if v is adjacent to all other vertices.

Proof

First, without loss of generality we assume that p2 and v=1.

  • (a)

    Let us set r=r(G). The claim is that r(Gv)r. Since r(Gv)p1 it suffices to consider the case rp1. Let AS+(p1,r) be in general position. By Part (a) in Lemma 2.3 there is a vector wRp1 such that B=(1wwA)S+(p,r)is in general position. Let PS++(p) such that P=GB. If we set α=V(Gv), then it is clear that A=GvP[α]S++(p1).

    Now we show that r(G)r(Gv)+1. Let us set r0=r(Gv) and let AS+(p,r0+1) be in general position. We assume without loss of generality that A is a correlation matrix. If we partition A as A=(1uuB),then C=Buu0. By using the Guttman rank additivity formula in  [19] and the fact that C[α]=B[α]u[α]u[α], for each αV, we conclude that CS+(p1,r0) is in general position. Therefore, there is a matrix QS++(p1) such that Q=GvC. Let us set P=(1uuQ+uu)S++(p). The additive property of the matching relation given by Eq. (3.1) now shows that Q+uu=GvB and therefore P=GA.

  • (b)

    Suppose v=1 is adjacent to 2,,p. As before, let us set r=r(G). By Part (a) above r(Gv)r1. Thus, it suffices to show that r(Gv)r1. For this, let AS+(p1,r1). Clearly, it suffices to consider the case rp1. By Part (b) of Lemma 2.3 there is a vector uRp1 such that the matrix (1uuA+uu)S+(p,r)is in general position. Therefore, there is a matrix PS++(p) such that P=(1uuQ)=G(1uuA+uu),for some matrix QS++(p1). Thus, A+uu=GvQ and consequently A=GvQuuS++(p1). □

Corollary 3.1

Let H be a subgraph of G . Then r(H)r(G).

Proof

Every subgraph of G is obtained by removing successively a finite number of vertices and edges from G. Therefore, it suffices to show that r(Gv)r(G), for each vertex vV, and r(Ge)r(G), for each edge eE. The latter is obvious using the condition () in Proposition 3.1. □

A graph G is said to be the clique sum of two subgraphs G1=(V1,E1) and G2=(V2,E2) if V=V1V2,E=E1E2 and G[V1V2] is a complete graph (including the empty graph). We write this as G=G1V1V2G2. The following proposition is now immediate using a standard completion process given by  [5] or  [10].

Proposition 3.4

Suppose G=G1V1V2G2 . Then r(G)=max{r(G1),r(G2)} . In particular, if G1,,Gk are all the connected components of G, thenr(G)=max{r(Gi):i=1,,k}.

Remark 3.2

In relation to the D&L problem (I) we can show (by slightly modifying our proofs and using the alternative description of rw(G) given by Remark 3.1) that the properties of the Gaussian rank discussed in this subsection are also valid properties of the weak Gaussian rank. This fact however will not be used in the paper.

4. The proof of Theorem 1.1

In this section we prove Theorem 1.1, that is, the bounds given by Eq. (1.1). The lower and the upper bounds for r(G) are proved in separate subsections.

4.1. The upper bound: r(G)δ(G)+1

First we state and prove the following key lemma.

Lemma 4.1

Let vV be a vertex of G . If r(Gv)degG(v)+1, then r(G)=r(Gv).

Proof

If degG=0, then by Proposition 3.4r(G)=max{1,r(Gv)}=r(Gv). Thus we assume that degG(v)1. For convenience, let us assume that v=1 and ne(v)={2,,degG(v)+1}. Note that V(Gv)={2,,p}. Now we set r0=r(Gv),τ={2,,r0}andγ=V(Gv)τ. Note that ne(v)τ since r0degG(v)+1. Let AS+(p,r0). We need to show that there is a matrix PS++(p) such that P=GA. For this, we assume that A is a correlation matrix and therefore can be partitioned as A=(1uuB)=(1u[τ]u[γ]u[τ]A[τ]A[τ,γ]u[γ]A[γ,τ]A[γ]),where B=(A[τ]A[τ,γ]A[γ,τ]A[γ]). Note also that uRp1 and BS+(p1,r0) is in general position. Let us set xτ=A[τ]1u[τ]. Now we have (1u[τ]u[τ]A[τ])0and therefore1u[τ]A[τ]1u[τ]>0. This implies that xτA[τ]xτ=xτB[τ]xτ<1. By Part (a) of Lemma 2.3 and Remark 2.2, if we set w=Bx, then BwwS+(p1,r0) is in general position. Therefore, there is a matrix QS++(p1) such that Q=GvBww. Let us set P=(1wwQ+ww)S++(p). By Eq. (3.1) it is clear that Q+ww=GvB. Also, since w[τ]=A[τ]xτ=u[τ] and ne(v)τ we have P1j=A1j whenever jne(v). Therefore, P=GA. □

We now prove that the upper bound r(G)δ(G)+1 holds.

Proof

By mathematical induction let us assume that for any graph H with fewer than p vertices r(H)δ(H)+1. Now let G be a graph with p vertices. We assume without loss of generality that p2. Let vV such that degG(v)=δ(G). On the contrary, let us assume that r(G)δ(G)+2, then by Part (a) of Proposition 3.3 we obtain r(Gv)r(G)1δ(G)+1degG(v)+1.Lemma 4.1 now implies that r(G)=r(Gv). The induction hypothesis then implies that r(G)=r(Gv)δ(Gv)+1. The fact that δ(G)=max{δ(G),δ(Gv)}   [15] implies that r(G)δ(G)+1. □

Remark 4.1

Note that Lemma 4.1 also holds for the weak Gaussian rank, that is, for any graph G if rw(Gv)degG+1, then rw(G)=rw(Gv). Consequently, this implies rw(G)δ(G)+1. But the latter more easily follows from the fact that rw(G)r(G)δ(G)+1.

4.2. The lower bound: κ(G)+1r(G)

To prove that this lower bound holds we mainly rely on Lovász–Saks–Schrijver’s Theorem in  [11]. This theorem associates the connectivity of a graph with its certain geometric representations. More details are as follows. Let G=(V,E) be graph. An orthonormal representation of G in Rd is a function ϕ:VRd assigning a unit vector ui to each vertex iV such that uiuj=0 whenever ijE. In words, ϕ assigns to each vertex a unit vector in Rd such that the vectors assigned to nonadjacent vertices are orthogonal. Now a general-position orthonormal representation of G in Rd is an orthonormal representation ϕ in Rd such that the p assigned vectors ϕ(1)=u1,,ϕ(p)=up are in general position in Rd. The next theorem is crucial for proving the lower bound in Theorem 1.1.

Theorem 4.1

Lovász–Saks–Schrijver  [11], [12]

If G=(V,E) is a graph with p vertices, then the following are equivalent:

  • (i)

    G is (pd)-connected;

  • (ii)

    G has a general-position orthonormal representation in Rd.

Remark 4.2

We note that in the proof of (i)(2) in  [11] it is shown that the set of general-position orthonormal representations in Rd is a subvariety of Rpd and, consequently, of probability zero.

We now proceed to prove that the lower bound κ(G)+1r(G) holds.

Proof

Since r(H)r(G) for every subgraph H of G, in light of Part (a) and Part (b) in Remark 2.1, it suffices to show that for every graph G we have r(G)κ(G)+1. First we set k=κ(G) and d=pk. By Theorem 4.1 there are general-position unit vectors u1,,upRd such that uiuj=0 for each ijE. Let Ω=(uiuj)1i,jp. By Lemma 2.2 we have ΩS+(p,d) is in general position. Let w1,,wkRp be a basis of Null(Ω) and set A=(w1wk)(w1wk)S+(p,k). The proof of Lemma 6.3 in  [18] shows that the matrix AS+(p,k) is in general position. Let PS(p) such that P=GA. Then tr(PΩ)=ijEPi,jΩi,j=ijEAi,jΩi,j=tr(AΩ)=0(note that AΩ=0). This shows that PS++(p), since S++(p) is a self dual cone, that is, S++(p)={QS(p):tr(QB)>0for every BS+(p){0}}, and ΩS+(p){0}. Thus there is no matrix PS++(p) such that P=GA. This shows that r(G)κ(G)+1. □

Remark 4.3

Note that the proof of lower bound above does not imply that κ(G)+1rw(G). This is because by using Lovász–Saks–Schrijver’s Theorem, in light of Remark 4.2, the set of all general position matrices AS+(p,κ(G)) in our proof is of probability zero, and this is not sufficient to conclude κ(G)>rw(G).

5. Some applications of Theorem 1.1

A useful application of the bounds given by Theorem 1.1 is that when the lower and upper bounds are equal the Gaussian rank is exactly determined. In this section we briefly discuss for which graphs κ(G)=δ(G). We also use Theorem 1.1 to obtain a sharp numerical upper bound for the Gaussian ranks of the so-called planar graphs.

5.1. Gaussian ranks of symmetric graphs

Let G=(V,E) be a graph. A permutation σ:VV is said to be an automorphism of G if ijEσ(i)σ(j)E. The graph G is said to be symmetric if for any two edges ij and ijE there is an automorphism σ of G such that σ(i)=i and σ(j)=j (please see Chapter 27 in  [7] for a detailed discussion of symmetric graphs). For example, one can check that the graphs given by Fig. 2(a) and (b) are symmetric. The symmetric property in particular implies that G is regular, that is, there is a positive integer k such that degG(v)=k for every vertex vV. An interesting feature of a symmetric graph G is that κ(G)=δ(G). This consequently implies that κ(G)=δ(G). Theorem 1.1 therefore implies that for a symmetric graph the Gaussian rank is exactly determined as (5.1)r(G)=κ(G)=δ(G). Note that Eq. (5.1) can hold for many non-symmetric graphs as well, such as the regular graph given by Fig. 2(c) or even non-regular graphs such as grids. To the best of our knowledge the class of all graphs G satisfying κ(G)=δ(G), or even κ(G)=δ(G), is not fully characterized.

  1. Download : Download full-size image

Fig. 2. Denoted graphs in (a) and (b) are symmetric. The graph given by (c) is not symmetric. In each graph the equality in Eq. (5.1) holds and the Gaussian ranks are easily computed to be 4, 4 and 5, respectively.

5.2. Gaussian ranks of random graphs

A random graph G(ϵ) in the Erdös–Rényi model is a graph in which the edges are selected by a sequence of (independent) Bernoulli trials with probability 0<ϵ<1   [7].

Theorem 5.1 Bollobás et al.

Almost surely for every random graph G(ϵ) we have κ(G(ϵ))=δ(G(ϵ)).

The next result now follows from Theorem 5.1 above and Theorem 1.1.

Corollary 5.1

Almost surely for every random graph G(ϵ) we haver(G(ϵ))=κ(G(ϵ))=δ(G(ϵ)).This in particular implies that almost surely for every random graph G(ϵ) we haverw(G(ϵ))κ(G)+1.

5.3. On the Gaussian ranks of planar graphs

A planar graph is a graph that can be drawn in a plane without the edges crossing each other. In graph theory it is well-known that planar graphs are at most 5-degenerate, that is, δ(G)5 if G is planar. Theorem 1.1 therefore implies that when G is a planar graph r(G)6. Note that the upper bound 6 is tight. For this, consider the planar graph G in Fig. 3. Since κ(G)=δ(G)=5 by Eq. (5.1) we have r(G)=6.

  1. Download : Download full-size image

Fig. 3. A planar graph with degeneracy number 5.

Remark 5.1

We note that Gross and Sullivant in  [6] prove that for every planar graph G,rw(G)4. Comparing this with the tight upper bound r(G)6 shows that for some graphs the week Gaussian rank can be strictly less than the Gaussian rank. For example for the graph G given by Fig. 3 we have rw(G)=4<r(G)=6.

Acknowledgments

Special thanks to Steen Andersson, Gérard Letac, Hélène Massam and Bala Rajaratnam for helpful discussions and feedback. I am also grateful to the referees for their detailed, helpful comments.

Appendix.

Proof of Lemma 2.1

First we note that randomly selecting p vectors in Rd is equivalent to randomly selecting a matrix WRd×p. Let X=(xi,j) denote a d×p matrix of variables. For each α={i1,,id}[p] we define the polynomial pα(X)=det(x1,i1x1,idxd,i1xd,id)andVα={WRd×p:pα(W)=0}. Each VαRd×p is an algebraic set and its (Lebesgue) measure is therefore zero. Thus, Pr(WRd×pαVα)=1. This completes the proof, since each d columns of a matrix WRd×pαVα are linearly independent. □

Proof of Lemma 2.2

  • ()

    Suppose AS+(p,d). Let each pair of (λ1,u1),,(λd,ud) denote a positive eigenvalue and its corresponding eigenvector of A. Let us set wi=λiui, for i=1,,d. Then we have (A.1)A=i=1dwiwi=WW,where W=(w1wd)=(v1vp)Rd×p. Note that v1,,vpRd are the columns of W. Suppose for α={i1,,id}[p] the vectors vi1,,vid are linearly dependent. Thus, there is a non-zero vector xα=(xi1,,xid)Rd such that iαxivi=W(xα0)=0WW(xα0)=A(xα0)=0. Thus A[α]xα=0 and consequently A[α] is singular.

  • ()

    Suppose v1,,vpRd are in general position. Let A be defined as in Eq. (2.2). Then for each α={i1,,id}[p] the submatrix A[α]=(vi1vid)(vi1vid)is obviously non-singular. □

Proof of Corollary 2.1

Let X=(x1xn)=(y1yp)Rn×p be the data matrix. Then by Lemma 2.1 with probability one y1,,yp, the columns of X, are in general position in Rn. The result now follows from Lemma 2.2, since whenever the columns of X are in general position S=1/n(XX) is also in general position. □

References

1

Throughout the paper, we always assume that the observations are independent and identically distributed.