Introduction
We consider the problem of learning graphs under attractive Gaussian Markov random fields, where all the partial correlations are nonnegative. Such property is also known as multivariate totally positive of order 2 (MTP2), and its applications include actuarial sciences [1], taxonomic reasoning [2], financial markets [3], [4], factor analysis in psychometrics [5], and graph signal processing [6], [7]. For example, in financial markets, instruments usually have positive dependencies as a result of the market factor [3], [8]. Graph learning under the attractive Gaussian Markov random fields can be formulated as estimating the precision matrices under MTP2 constraints.
Recent works reported that MTP2 constraints can reduce the sample complexity to make the maximum likelihood estimator exist [2], [5], [9], [10]. One interesting result provided in [2], [5] showed that the maximum likelihood estimator under MTP2 constraints exists if the sample size satisfies n ≥ 2, independent of the underlying dimension p. This result leads to a significant reduction from n ≥ p, which is necessary for the maximum likelihood estimator to exist under unconstrained Gaussian graphical models. Aside from Gaussian distributions, the advantages of MTP2 constraints have also been explored in the binary exponential family, showing that the maximum likelihood estimator may exist with only n = p observations [10], while n ≥ 2p is required if there are no MTP2 constraints. Such advantages of MTP2 constraints are significant in the high-dimensional regime, where the samples are usually limited compared with the dimension.
Graph learning under Gaussian Markov random fields has been widely studied. One well-known method is the graphical lasso [11], [12], [13], which is formulated as the ℓ1-norm penalized Gaussian maximum likelihood estimation. Various extensions of graphical lasso and their theoretical properties have also been studied [14], [15], [16], [17], [18], [19], [20], [21], [22]. For the case of attractive Gaussian Markov random fields, the authors in [2], [23] proposed the Gaussian maximum likelihood estimation method under MTP2 constraints, and developed a block-coordinate descent (BCD) algorithm. The authors in [6] incorporated the ℓ1-norm regularization and connectivity constraints, and developed a similar scheme by cyclically updating each column/row. Note that BCD algorithms are usually time-consuming in high-dimensional problems. In another direction, one may consider second-order methods. However, they usually need to compute the (approximate) inverse Hessian, which leads to computational inefficiency. In addition, it is still unknown whether the ℓ1-norm approaches can succeed to recover the underlying graph edges under attractive Gaussian Markov random fields.
The main contributions of this paper are threefold:
We propose a computationally efficient projected Newton-like algorithm to solve the ℓ1-norm regularized Gaussian maximum likelihood estimation under MTP2 constraints. By exploiting the structure of the Gaussian maximum likelihood estimation, the proposed algorithm significantly reduces the computational cost in computing the approximate Newton direction.
We prove that the ℓ1-norm regularized Gaussian maximum likelihood estimation method can recover the graph edges correctly under irrepresentability condition.
Numerical experiments on synthetic and financial time-series data demonstrate the effectiveness of the proposed algorithm. It is observed that the proposed algorithm takes less computational time than the state-of-the-art methods.
The remainder of the paper is organized as follows. Preliminaries and related work are provided in Section II. We propose a novel algorithm, and present the theoretical results on the successful edge recovery guarantees in Section III. Experimental results are provided in Section IV, and conclusions are made in Section V.
Notations: Lower case bold letters denote vectors and upper case bold letters denote matrices. Both Xij and [X]ij denote the (i,j)-th entry of X. Let ⊗ be the Kronecker product, and supp(X) = {(i,j)|Xij ≠ 0}. [p] denotes the set {1,…,p}. X⊤ denotes transpose of X.
Preliminaries and Related Work
In this section, we first introduce the attractive Gaussian Markov random fields, then present related works.
A. Attractive Gaussian Markov Random Fields
We denote an undirected graph by
In this paper, we focus on the attractive Gaussian Markov random fields, where the precision matrix Θ is a symmetric Mmatrix [24], i.e.,
We aim to estimate a sparse precision matrix under attractive Gaussian Markov random fields given independent and identically distributed observations y(1),…,y(n) ∈ ℝp. Let
We propose a second-order algorithm in Section III to solve this problem.
B. Related Work
Sparse precision matrix estimation under Gaussian Markov random fields has been extensively studied in the literature. One popular approach is the ℓ1-norm regularized Gaussian maximum likelihood estimation [11], [12], [13], and numerous algorithms have been proposed for solving this problem. A representative, yet not exhaustive, list of works available in the literature include: block coordinate ascent method [11], [25], Nesterov’s smooth gradient method [12], projected gradient method [26], projected quasi-Newton [27], augmented Lagrangian method [28], inexact interior point method [29], primal proximal-point with Newton-CG method [30], and Newton’s method with quadratic approximation [31], [32], [33]. However, all the methods mentioned above cannot be directly extended to estimate precision matrices in our problem because of MTP2 constraints.
To estimate precision matrices under MTP2 constraints, one option is using the Gaussian maximum likelihood estimator [5], [34], and the sign constraint can promote the sparsity of the solution implicitly. To solve this problem, a primal algorithm [2] and a dual algorithm [23] were proposed based on block coordinate descent. The authors in [6] proposed the ℓ1-norm regularized Gaussian maximum likelihood estimation, and updated each column/row of the variable by solving a nonnegative quadratic program. In addition, there is growing interest in learning graphs under the generalized graph Laplacian models [6], [23], [35], where the precision matrices satisfy MTP2 constraints. In this paper, we aim to propose an efficient algorithm for estimating sparse precision matrices under MTP2 constraints, and establish the successful edge recovery guarantees.
Proposed Algorithm and Theoretical Results
In this section, we first derive a second-order algorithm to solve the ℓ1-norm regularized Gaussian maximum likelihood estimation under MTP2 constraints, then provide theoretical results on successful recovery of graph edges.
A. Proposed Algorithm
To solve Problem (2), we propose a projected Newton-like algorithm. In each iteration, we partition the algorithm variables into two sets, i.e., free and restricted sets, and only update the variables in the free set while fixing the variables in the restricted set. The restricted set of variables is defined by
Now we construct the projected Newton-like step as follows,
The approximate Newton direction in (6) may involve computing the inverse of the Hessian matrix with the dimension p2 × p2 or solving a system of linear equations of the same dimension, which is computationally challenging. In what follows, we aim to simplify the computation in (6). We construct the gradient scaling matrix Qkas follows,
Following from the property of Kronecker product that vec(ABC) = (C⊤⊗A)vec(B), we obtain
As a result, we can get
Therefore, we have
In each iteration, we only update the variables in the free set, i.e.,
Note that the Newton-type methods for solving our problem (2) are usually computationally expensive, because of the computations of the (approximate) inverse of the Hessian matrices with the dimension p2 × p2. However, it is observed that our constructed iterates as shown above only involves the matrix multiplication with the dimension p × p and gradient computation. The step size can be computed by the Armijo step-size rule.
B. Theoretical Results
In this subsection, we provide theoretical results to show that the estimator X⋆ defined in (2) can recover the underlying graph edges correctly under irrepresentability condition.
Let
Assumption 1. There exists some α ∈ (0,1] such that
Assumption 2. The nonzero elements of the underlying precision matrix Θ satisfy
Assumption 1 presents the irrepresentability condition that is almost necessary for the ℓ1-norm approaches to recover the supports correctly [14]. Assumption 2 imposes a lower bound on the minimum absolute value of nonzero elements of Θ.
Theorem 1. Suppose the sample covariance matrix satisfies
where cλ is a positive constant.
Theorem 1 shows that the support of the minimizer X⋆ of Problem (2) is the same with that of the underlying precision matrix Θ under the irrepresentability condition, implying that all the underlying graph edges can be identified correctly through X⋆. In addition, the condition
Experimental Results
In this section, we present numerical results on synthetic data and financial time-series data to verify the performance of the proposed algorithm. All the experiments are conducted on a PC with a 2.8 GHz Inter Core i7 CPU and 16 GB RAM.
A. Synthetic Data
We consider to learn Barabasi-Albert graphs that are useful in many applications. The graph structure and its weights associated with the edges are generated randomly. We obtain a weighted adjacency matrix W, where Wij denotes the graph weight between node i and node j. We construct the underlying precision matrix by Θ = D− W, where Dis a diagonal matrix, which is generated to ensure that Θ is an M-matrix and the irrepresentability condition in Assumption 1 holds.
Given the underlying precision matrix Θ ∈ ℝp×p, we generate n independent observations
We compare the computational time with the state-of-the-art
Next, we show that the estimator defined in (2) can recover the graph edges correctly under irrepresentability condition. We use
It is observed in Figure 1 that our algorithm can achieve the
B. Financial Time-series Data
The MTP2 models are justified well on the stock data since the market factor leads to the positive dependencies among stocks [3]. The data is collected from 189 stocks composing the S&P 500 index during a period from Oct. 1st 2017 to Jan. 1st 2018, resulting in 62 observations per stock, i.e., p = 189 and n = 62. We construct the log-returns data matrix by
It is observed in Figure 2 that the performance of our algorithm is better than
Conclusions
In this paper, we have proposed a projected Newton-like algorithm to solve the ℓ1-norm regularized Gaussian maximum likelihood estimation under MTP2 constraints. The proposed algorithm significantly reduces the computational cost in computing the approximate Newton direction. We have proved that our method can recover the graph edges correctly under the irrepresentability condition, which has been verified by numerical results. Experiments have demonstrated that the proposed algorithm takes significantly less computational time than the state-of-the-art methods.