Sample complexity of determining structures of graphical models

Publisher: IEEE

Abstract:
We consider discovering the graph structure of a pairwise Markov random field (MRF) on p binary random variables using n samples from the underlying MRF distribution. We analyze the information-theoretic limitations of this problem under high-dimensional scaling, when the number of connections of each variable in the underlying MRF is bounded by d.We derive both necessary and sufficient conditions on the scaling of the triplet (n, p, d) for asympotically reliable recovery of the graph structure.
Date of Conference: 23-26 September 2008
Date Added to IEEE Xplore: 04 March 2009
ISBN Information:
Publisher: IEEE
Conference Location: Monticello, IL, USA

SECTION I.

Introduction

The problem of graphical selection is to correctly estimate the graph structure of a Markov random field given samples from the underlying distribution. This problem is central to statistics and machine learning, with consequences for a variety of application domains, e.g., image analysis [2], social networks, and computational biology [6].

In general, a graphical model is a structured representation of a joint distribution of potentially dependent random variables. In the models considered in this paper, we use the vertices of a graph G=(V,E) to represent our (binary) random variables. Let |V|=p. Corresponding to the graph, we associate a joint probability distribution on the p random variables with the edges E determining the exact joint distribution. Perhaps a big strength of the graphical model representation is the visualization possible about which models are close to each other. One would expect distributions with similar graphs to be close to each other and indeed, as an auxillary result, we formalize this notion in this paper.

The distributions we consider are Markov random fields with respect to their graphs. Namely, each distribution satisfies a spatial Markov-type independence relationship—any random variable X, conditioned on those adjacent to it in the graph associated with the distribution, is independent of all other variables.

Consider a set G of such models. A sample from a distribution in G is a p-dimensional binary vector, corresponding to the realization of the p random variables. Given several samples from a distribution, formally, the graph selection problem requires us to choose the structure of the model in G that best fits the data.

If the underlying graph is known to be tree-structured, then the problem can be reformulated as a maximum-weight spanning tree problem [5], and solved in polynomial time. On the other hand, for fully general graphs with cycles, the problem is known to be computationally hard [4]. Nonetheless, a variety of practical methods have been proposed, including constraint-based approaches [14], heuristic search, and 1-based relaxations [9], [16], [17]. In particular, it can be proven that some of the 1-based approaches are consistent for model selection under particular scalings of the graph size, degrees, and number of samples [9], [16], and for the case where each degree is bounded by d, the sample complexity has been independently obtained in [3].

Of complementary interest—and the focus of the paper—are the information-theoretic limitations of the graphical selection problems for pairwise binary Markov random fields, also known as Ising models [1]. The Ising model (1) has its origins in statistical physics [1], where it is used to model physical phenomena such as crystal structure and magnetism. Its relative simplicity and the ability to capture practically useful dependencies makes it a candidate for image processing [2], [7], gene network analysis, analysis of social networks, and recently in coding theory starting from [13] (see e.g. [10] for details and a list of references on the topic) and communications, e.g. [8].

At the same time, the dimensionality of the data sets—the number of random variables in question—has increased significantly for many practical problems. For example, gene networks attempt to model interactions between the expression levels of thousands of genes, usually using only tens of samples. The natural question, one that will be addressed here, is whether it is even possible to obtain anything useful from such limited data, and if so, what?

We are given n i.i.d. samples—namely np-dimensional vectors—from a fixed but unknown model in G. As mentioned before, we consider infering E of the graph alone—namely given samples from an unknown model in G, we have to infer the structure (edge set E) of the unknown model. This setting is useful for problems in gene regulatory network inference, for example, where there is a severe shortage of data to do a model selection problem.

How must n scale with parameters corresponding to the graphs of the G so that there exists a method that recovers the correct probability model? Conversely, for what scalings does any method fail to recover the underlying structure correctly? Note that from this perspective, the graphical selection problem is a channel coding problem, in which the messages are the graphs in our family, and each use of the channel corresponds to taking an i.i.d. sample.

We analyze the information-theoretic limitations of this problem under high-dimensional scaling, when the connections of each variable in the underlying MRF is bounded by d. We derive both necessary and sufficient conditions on the scaling of the triplet (n,p,d) for asympotically reliable reocovery of the graph structure.

In Section II, we set up the problem of determining the sample complexity of graphical selection formally. Section III contains the formal statements of our sample complexity results. Section VI contains our results regarding the divergence between Ising models, showing that, roughly speaking, distributions represented by graphs that are similar are indeed closer. Section V outlines the core results that we use for the proofs of results in Section III, and Sections VII and VIII outline the proofs.

SECTION II.

Problem statement

We begin with a precise statement of the problem.

Given an undirected graph G=(V,E), let us associate with each vertex iV a binary random variable Xi{1,+1}. We then consider a distribution over the random vector X=(X1,,Xp) with probability mass function

Pλ(x)=1Z(Λ)exp(s,t)Eλstxsxt,(1)
View SourceRight-click on figure for MathML and additional features. where Z(Λ) is the normalization constant.

We will refer to the (p2) parameters, λ{st},s,tV., as the edge parameters.

At the outset, the sample complexity of inferring an edge set depends in on some form of an upper bound on the edge parameters. As the following example indicates, if the edge parameters are unboundedly large, it may be impossible to distinguish distinct models (potentially with distinct edge sets), even given infinite amounts of data.

Example 1:

Consider the set of all graphical models on p=3 variables, and every edge parameter{0,λ]. Note that there are a total of three such graphs corresponding to graphs on 3 vertices with exactly 2 edges. If λ=, it is easy to see that all these models reduce to the guilt by association distribution—i.e., the two configurations [111] and [111] have probability 12 each. There is hence no way to distinguish between the 3 models in the case λ=.□

To take into account the above observation, our results for a class G depend on B —the smallest real number that satisfies the following for all ΛG. If Λ is associated with the graph (V,E), with parameters λst on (s,t)E, then for all sV,

(s,t)E|λst|B.(2)
View SourceRight-click on figure for MathML and additional features.

No matter what the class G is, selection of a graphical model is not necessarily the same as simply estimating pairs of vertices (s,t) for which cov{Xs,Xt}0; in-deed, given the distribution (1), it can be seen that Xs and Xt could be correlated even when there is no direct edge connecting them.

On the other hand, if each edge parameter that is nonzero is finite, given infinite amounts of data, two distinct models Λ(i) and Λ(j) can always be told apart by looking at the vector of all covariances {EXsXt}(s,t)V2sv.

We observe that a new elementary proof of the above statement can be constructed along the lines of the arguments presented here (see [11] for full proofs). Morever, we note that a more general result on exponential families exists along these lines, see [15].

With finite data covariances can only be approximated. The question therefore is, how fine should these approximations be? This question determines the number of samples that are necessary for the graph selection problem, and a constructive argument determines the number of samples that are sufficient.

In [12], we considered fitting bounded degree models.

Let θ be a (p2) length vector of real numbers. We index the components of the vector by a pair of numbers {s,t},s,tV, and write the components as θst. Furthermore, assume that for some real number λ>0,|θ|λ1, where the vector inequality is taken to mean a component-wise one. Then the set Gd,p,θ contains all models on vertices V such that

  1. each vertex v in V has degree at most d, namely for the edge set E associated with the model, |{eE:ve}|d, for some d1

  2. if (s,t)E, the edge parameter is θst, else 0.

We build on the above case to let the edge parameters be arbitrary, as long as they are bounded. Specifically,

Gd,p=Gd,p,θ,
View SourceRight-click on figure for MathML and additional features. where the union is taken over all θ such that |θ|λ1 and each class in the union satisfies Equation (2). Note that corresponding to each edge set, there is now a (infinite) set of models possible. The task now is to determine the edge set of the underlying model, hence it is sufficient to distinguish between the subsets of Gd,p that correspond to different edge sets, rather than different models.

One observation regarding the models described above: in a model belonging to any of the classes above, for all vertices sV,EXs=0.

SECTION III.

Results

We begin by stating the results obtained for the sample complexity of inferring the structure of graphical models. The following theorem addresses the sufficiency part.

Theorem 1:

Let 0<δ1 and let

λ=min(s,t)V2λst.
View SourceRight-click on figure for MathML and additional features.

If

n>(3e2B+1)2sinh4(λ2)(2logp+log1δ)
View SourceRight-click on figure for MathML and additional features. then decoder q:XnGd,p such that for all Λ(i),1iN,
P(q(Xn)Λ(i)|Λ(i))<δ.
View SourceRight-click on figure for MathML and additional features.

The following theorem addresses the number of samples that are necessary for the estimation of the edge structure.

Theorem 2:

For all decoders q:XnGd,p, if B1 and

n12eB/2dλ(logpd1)32(eλeλ)+14dlogpd
View SourceRight-click on figure for MathML and additional features. there model Λ(i) such that
P(q(Xn)Λ(i)|Λ(i))12.
View SourceRight-click on figure for MathML and additional features.

SECTION IV.

Preliminaries

To quantify how far apart and therefore, how easy it is to tell models apart, we define distance measures between models. In Section IV-B, we examine the behavior of the edge inference problem for various values of the parameter B.

A. Properties of the distance measures

We use the following distance between models Λ(i) and Λ(j)

D(Λ(i),Λ(j))=defD(Λ(i)+Λ(j)2Λ(i))+D(Λ(i)+Λ(j)2Λ(j)),(3)
View SourceRight-click on figure for MathML and additional features. where the (Λ(1)+Λ(j))/2 denotes the model obtained by averaging the parameters of the two models. This bounds the Chernoff exponent in the hypothesis test between models Λ(i) and Λ(j). In Theorem 3, we bound D(i,j) for any two models.

Another useful measure of distance we consider is the symmetrized KL distance between two models, Λ(i) and Λ(j),

J(Λ(i),Λ(j))=defD(Λ(i)Λ(j))+D(Λ(j)Λ(i)).(4)
View SourceRight-click on figure for MathML and additional features.

Clearly D(Λ(i),Λ(j))0 and J(Λ(i),Λ(j))>0. For the above defined distances between models Λ(i) and Λ(j), we write D(i,j) and J(i,j) where there is no ambiguity. Note that

J(i,j)=J(Λ(i),Λ(j))=(s,t)EiEj(λistλjst)(EΛ(i)XsXtEΛ(j)XsXt)(5)
View SourceRight-click on figure for MathML and additional features.

Corollary 1:

Let Λ(1) be any Ising model. Let Λ(2) be the model obtained by increasing (decreasing) the parameter corresponding to any edge. The correlation on the edge increases (decreases) in model Λ(2) since J(1,2)0. □

Lemma 1:

For all models Λ(i) and Λ(j),D(i,j)12J(i,j).

Proof 1:

The lemma follows by noting that

D(i,j)J(i,i+j2)+J(j,i+j2),
View SourceRight-click on figure for MathML and additional features. and with a little bit of algebra that
J(i,i+j2)+J(j,i+j2)=12J(i,j).
View SourceRight-click on figure for MathML and additional features.

B. Inference for various values of B

Let Km be a fully connected graph on m+1 vertices. Let Λst be a model with its graph being Kr with the edge (s,t) removed, and every edge parameter being λ. We compute the indirect correlation on the missing edge. In the Ferromagnetic case, the FKG inequality implies that this is the maximum possible indirect correlation due to a model with any graph on these m+1 nodes, and edge parameters being λ.

Lemma 2:

If B1, for the model Λst defined above

EΛstXsXt12(m+1)e3λ/2eB/2+(m+1)e3λ/2
View SourceRight-click on figure for MathML and additional features.

Proof 2:

A simple computation yields

P(XsXt=1)P(XsXt=1)=ml=0(ml)exp(λ2[(2lm+1)24])ml=0(ml)exp(λ2[(2lm)2])
View SourceRight-click on figure for MathML and additional features.

We lower bound the above ratio. To do so, we pick the largest term in the denominator. It will be shown that for B12logm, the largest term is the one corresponding to i=m and i=0, while for 1B12logm, the largest term lies in the range i>3m/4 and i<m/4.

Let the largest term be l, wolog, m/2. It follows that

P(XsXt=1)P(XsXt=1)(a)(ml)exp(λ2[(2lm+1)24])(m+1)(ml)exp(λ2[(2lm)2])=exp(λ2[4l2m3])m+1exp(λ2[m3])m+1=exp(B232λ)m+1.
View SourceRight-click on figure for MathML and additional features.

Consider two models in Gd,p,θ with all their parameters being λ:(i)Λ(1) with its graph being Kd with an edge, say (s,t), removed and (ii) Λ(2) with its graph being Kd with a different edge, say (s,t), removed. From the FKG inequality and noting that model Λ(2) contains the edge (s,t), observe that

P2(XsXt=1)P2(XsXt=1)P1(XsXt=1)P1(XsXt=1)e2λ.
View SourceRight-click on figure for MathML and additional features.

Hence

J(1,2)2(e2λ1)P1(XsXt=1)P1(XsXt=1).(6)
View SourceRight-click on figure for MathML and additional features.

Hence if B>1, it follows that

J(1,2)=λ(EΛ(2)XsXtEΛ(1)XsXt)+λ(EΛ(1)XsXtEΛ(1)XsXt)2Be5λ/2sinh(λ)eB/2.
View SourceRight-click on figure for MathML and additional features.

Similarly considering K2k and constructing two models from it as above, we obtain two models in Gk,p,θ separated by a distance that is exponentially small in B.

It follows that if B1 and any model is to be infered with high probability, the sample complexity should grow exponentially in the parameter B

Lemma 3:

If B12, then

EΛstXsXt12e2B+1
View SourceRight-click on figure for MathML and additional features.

Proof 3:

From the definition of B, it follows that

P(XsXt=1)P(XsXt=1)e2B,
View SourceRight-click on figure for MathML and additional features. implying the lemma.

SECTION V.

Techniques

We briefly survey some of the results that form the core of the attack on Theorems 1 and 2 in the next two subsections respectively.

A. Achievable number of samples using an optimal decoder

Roughly speaking, given distinct models Λ(i) and Λ(j), there exist edge(s) (a,b),a,bV such that EΛ(i)[XaXb]EΛ(j)[XaXb]. The sample complexity of estimating a model in Gd,p,θ or Gd,p depends on how small the sample variance should be so that we can notice the above difference in means in a statistically significant manner.

We estimate the difference in means by using the pairwise KL divergence between models (the two-way hypothesis testing step).

B. Necessary number of samples

To estimate the necessary number of samples for decoding the edge structure, we fall back on a version of Fano's inequality. Essentially, Fano's inequality quantifies how much information is gleaned from any single sample, and therefore the number of samples we need before we have sufficient information to infer the edge structure.

1) Fano's inequality

We formalize the above intuition in the following lemma.

Lemma 4:

For any set of w models Λ(1) through Λw and all decoders q:Xn{Λ(1),,Λw}, if

nw2logw421i<jwJ(i,j),
View SourceRight-click on figure for MathML and additional features. then
max1iw P(q(Xn)Λ(i)|Λ(i))12.
View SourceRight-click on figure for MathML and additional features.

Proof 4:

Let Λ be a uniform random variable taking values from the set of models Λ(1) through Λw. Let

Pe=1w 1iwP(q(Xn)Λ(i)|Λ(i)).
View SourceRight-click on figure for MathML and additional features.

From Fano's inequality, H(Λ|Xn)1Pelog(w1), we obtain

log(w)1Pelog(w1)H(Λ)H(Λ|Xn)=H(Xn)H(Xn|Λ)=nwi=1wD(Λ(i)1wj=1wΛ(j))nw21i,jwD(ij)=nw2 1i<jw J(i,j).
View SourceRight-click on figure for MathML and additional features.

The lemma follows. □

SECTION VI.

Divergence between models

We require a bound on the distance D(i,j) between models Λ(i) and Λ(j) in order to see how far apart the vector of edgewise correlations of the two distributions are.

It is intuitive that the distance D(i,j) between models should be related to the number of edges that exist only in one model. Indeed, Theorem 3 confirms this— D(i,j) is proportional to the matching number of (EiEj)(EjEi). Recall that a matching of a graph G is a subgraph H of G such that each vertex in H has degree 1, and that the matching number of G is the largest possible number of edges in any matching.

Theorem 3:

Let Λ(i) and Λ(j) be models with distinct edges, and let m be the matching number of the graph (EiEj)(EjEi). Let

λ=min(s,t)EiEjλst.
View SourceRight-click on figure for MathML and additional features.

Then

D(i,j)m3e2B+1sinh2(λ4).
View SourceRight-click on figure for MathML and additional features.

SECTION VII.

Proof outline of Theorem 1

For the class Gd,p, we pick the set of edges in the model which best models the data, given the freedom to pick the edge parameters as we see fit so long as they are bigger than λ. If we locate regions that could arise from each set of edges in the space of covariances {EXsXt}, then the separation between the regions determines the precision to which the covariances must be estimated in order to retrieve the edges accurately.

Suppose we have two models Λ(1) and Λ(2). The following Lemma shows that adjacent to any vertex s with different neighborhoods in the two models, there exists an edge (s,t) such that |E1XsXtE2XsXt| is suitably large.

Lemma 5:

Let Λ(1) he a model with edge set E1 and Λ(2) be a model with edge set E2. For all edges (s,t)(E1E2)(E2E1)

maxu{s,t},vV|EPλXuXvEPΛXuXv|sinh2(λ/4)(3e2B+1).
View SourceRight-click on figure for MathML and additional features.

A Hoeffding-type large deviations result and an union bound yields the following result. If

n>(3e2B+1)2sinh4(λ2)(2logp+log1δ)
View SourceRight-click on figure for MathML and additional features. then decoder q:XnGd,p such that for all Λ(i),1iN,
P(q(Xn)Λ(i)|Λ(i))<δ.
View SourceRight-click on figure for MathML and additional features.

SECTION VIII.

Proof outline of Theorem 2

The trick is to find subsets of models in Gd,p,θ (for any θ) that determine the necessary regions. Since we determine the necessary region for Gd,p,θ no matter what the edge parameters θ are, the necessary region for Gd,p follows from the result on Gd,p,θ.

We require the following lemma to determine a lower bound on sample complexity.

Lemma 6:

The cardinality of Gd,p,θ satisfies for dp2,

(pd+1!)d(d+1)/2|Gd,p,θ|pd((p2)pd4),
View SourceRight-click on figure for MathML and additional features. therefore
log|Gd,p,θ|=Θ(pdlogpd).
View SourceRight-click on figure for MathML and additional features.

Proof 5:

For the upper bound on |Gd,p,θ|, observe that every model in Gd,p,θ has at most pd2 edges. An upper bound on |Gd,p,θ| is provided by the number of graphs with at most pd/2 edges with no restrictions on degrees. To upper bound the later, note that the number of graphs with exactly r edges is ((p2)r) and for dp/2, the number of graphs with pd/2 edges is greater than the number of graphs with r edges for all rpd/2, the upper bound on Gd,p,θ follows.

For the lower bound, we proceed as follows. Group p vertices into d+1 even groups (throw away any remaining vertices). Pick a permutation of p/(d+1), and form an bijection from group 1 to group 2 corresponding to the permutation. Similarly form an injection from group 1 to 3,,d+1 using d1 other permutations (we use up d permutations in all).

Similarly use d1 permutations to connect from group 2 to groups 3,,d+1;d+1i permutations to connect from group i to groups i+1,,d+1 and so on.

Therefore each choice 1+2++d=d(d+1)/2 permutations leads to a different graph satisfying degree bound d. The lemma follows. □

We reproduce the statement of Theorem 2 for convenience here: for all decoders q:XnGd,p, if B1 and

n12eB/2dλ(logpd1)32(eλeλ)+14dlogpd.
View SourceRight-click on figure for MathML and additional features. there model Λ(i) such that
P(q(Xn)Λ(i)|Λ(i))12.
View SourceRight-click on figure for MathML and additional features.

Proof outline 6:

We first consider the necessary region for the class Gk,p,θ. Let k+1(12) for some l, consider the following two models: (i) Λ(1): the completely connected graph with an edge removed, say (s,t) and (ii) Λ(2): the completely connected graph with a different edge removed, say (s,t). Note that

P2(XsXt=1)P2(XsXt=1)P1(XsXt=1)P1(XsXt=1)e2λ,
View SourceRight-click on figure for MathML and additional features. we obtain as in (6) that
J(1,2)2(e2λ1)P1(XsXt=1)P1(XsXt=1).(7)
View SourceRight-click on figure for MathML and additional features.

The FKG inequality implies that reducing the parameter on every edge of Λ(1) to λ only reduces the correlation on the edge (s,t). Using Lemma 2 and (7) that long as B>1,

J(1,2)2Bsinh(λ)eB/2.
View SourceRight-click on figure for MathML and additional features.

We now consider the set of k+1 models, each with a different edge removed from the complete graph. From Lemma 4, if

n<eB/2logk2Bsinh(λ)logkJ(l,2),
View SourceRight-click on figure for MathML and additional features. then the error probability is 12. Similarly considering the set of models obtained with the addition of any edge to any one of the models above, say 1, (except for the one that completes the fully connected graph) yields that for the error probability to be <12,
n>2logp(eλeλ).
View SourceRight-click on figure for MathML and additional features.

For the bounded degree case, we group the p vertices into sets of d+1 vertices, and consider the graph G0 obtained by fully connecting each subset of d+1 vertices (p/(d+1) cliques of size d+1. Consider the following subset of models: from G0, remove one edge. If p2(d+1), it follows that we obtain p/(d+1)(d+12)pd/4 such models, and for each distinct pair of models i and j so obtained

J(i,j)4B(eλeλ)eB.
View SourceRight-click on figure for MathML and additional features.

Fano's inequality yields

n>logpd1J(1,2)eB(logpd1)4B(eλeλ)eB/2dλ(logpd1)32(eλeλ).
View SourceRight-click on figure for MathML and additional features.

Next, we show that >dlogp are always required. Observe that for all n

I(Xn;Λ)H(Xn)np.
View SourceRight-click on figure for MathML and additional features.

From Lemma 6, for all n,

I(Xn;Λ)log|Gd,p,θ|nppdlogpdndlogpd.
View SourceRight-click on figure for MathML and additional features.

The theorem follows. □

ACKNOWLEDGEMENTS

This work was partially supported by NSF grants CAREER-0545862 and DMS-0528488.

    References

    1.
    R. J. Baxter. Exactly solved models in statistical mechanics. Academic Press, New York, 1982.
    2.
    J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B, 48 (3) :259-279, 1986.
    3.
    G. Bresler, E. Mossel, and A. Sly. Reconstruction of markov random fields from samples: Some easy observations and algorithms. In http://arxiv.org/find/ all/1/au:+bresler/0/1/0/all/0/1.
    4.
    D. Chickering. Learning Bayesian networks is NP-complete. Proceedings of AI and Statistics, 1995.
    5.
    C. K. Chow and C. N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Trans. Info. Theory, IT-14:462-467, 1968.
    6.
    R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, editors. Biological Sequence Analysis. Cambridge University Press, Cambridge, 1998.
    7.
    S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI, 6:721-741, 1984.
    8.
    D. Guo and T. Tanaka. Generic multiuser detection and statistical physics. In M. Honig, editor, Advances in Multiuser Detection. John Wiley and Sons, Inc. To appear.
    9.
    N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 2006. To appear.
    10.
    M. Mezard and A. Montanari. Information, physics and computation. Draft of book in preparation, 2008. Draft from http://www.stanford.edu/montanar/ BOOK/book.html.
    11.
    N. P Santhanam and MJ. Wainwright. Information theoretic limits of graphical model selection in high dimensions. Available from http://www.eecs.berkeley.edu/prasadsn/index.html.
    12.
    N. P. Santhanam and MJ. Wainwright. Information-theoretic limits of graphical model selection in high dimensions. In Proceedings of IEEE Symposium on Information Theory, July 2008.
    13.
    N. Sourlas. Spin glass models as error correcting codes. Nature, 339:692-695, June 1989.
    14.
    P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction and search. MIT Press, 2000.
    15.
    M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Technical report, UC Berkeley, Department of Statistics, No. 649, September 2003.
    16.
    M. J. Wainwright, P. Ravikumar, and J. Lafferty. High-dimensional graph selection using l-regularized logistic regression. In NIPS Conference, December 2006.
    17.
    M. Yuan and Y. Lin. Model selection and estimation in the Gaussian graphical model. Biometrika, 94 (1) :19-35, 2007.