A Fast Algorithm for Graph Learning under Attractive Gaussian Markov Random Fields

Publisher: IEEE

Abstract:
We consider the problem of graph learning under Gaussian Markov random fields, where all partial correlations are nonnegative. Such model is called attractive Gaussian Markov random fields, and has received considerable attention in recent years. The graph learning problem under this model can be formulated as the ℓ 1 -norm regularized Gaussian maximum likelihood estimation of the precision matrix under sign constraints. In this paper, we propose a projected Newton-like algorithm, which is computationally efficient. By exploiting the structure of the Gaussian maximum likelihood estimation problem, the proposed algorithm significantly reduces the computational cost in computing the approximate Newton direction. Then we prove that the proposed method can recover the graph edges correctly under the irrepresentability condition. Numerical results on synthetic and financial time-series data sets demonstrate the effectiveness of the proposed method.
Date of Conference: 31 October 2021 - 03 November 2021
Date Added to IEEE Xplore: 04 March 2022
ISBN Information:
ISSN Information:
INSPEC Accession Number: 21708159
Publisher: IEEE
Conference Location: Pacific Grove, CA, USA

SECTION I.

Introduction

We consider the problem of learning graphs under attractive Gaussian Markov random fields, where all the partial correlations are nonnegative. Such property is also known as multivariate totally positive of order 2 (MTP2), and its applications include actuarial sciences [1], taxonomic reasoning [2], financial markets [3], [4], factor analysis in psychometrics [5], and graph signal processing [6], [7]. For example, in financial markets, instruments usually have positive dependencies as a result of the market factor [3], [8]. Graph learning under the attractive Gaussian Markov random fields can be formulated as estimating the precision matrices under MTP2 constraints.

Recent works reported that MTP2 constraints can reduce the sample complexity to make the maximum likelihood estimator exist [2], [5], [9], [10]. One interesting result provided in [2], [5] showed that the maximum likelihood estimator under MTP2 constraints exists if the sample size satisfies n ≥ 2, independent of the underlying dimension p. This result leads to a significant reduction from np, which is necessary for the maximum likelihood estimator to exist under unconstrained Gaussian graphical models. Aside from Gaussian distributions, the advantages of MTP2 constraints have also been explored in the binary exponential family, showing that the maximum likelihood estimator may exist with only n = p observations [10], while n ≥ 2p is required if there are no MTP2 constraints. Such advantages of MTP2 constraints are significant in the high-dimensional regime, where the samples are usually limited compared with the dimension.

Graph learning under Gaussian Markov random fields has been widely studied. One well-known method is the graphical lasso [11], [12], [13], which is formulated as the 1-norm penalized Gaussian maximum likelihood estimation. Various extensions of graphical lasso and their theoretical properties have also been studied [14], [15], [16], [17], [18], [19], [20], [21], [22]. For the case of attractive Gaussian Markov random fields, the authors in [2], [23] proposed the Gaussian maximum likelihood estimation method under MTP2 constraints, and developed a block-coordinate descent (BCD) algorithm. The authors in [6] incorporated the 1-norm regularization and connectivity constraints, and developed a similar scheme by cyclically updating each column/row. Note that BCD algorithms are usually time-consuming in high-dimensional problems. In another direction, one may consider second-order methods. However, they usually need to compute the (approximate) inverse Hessian, which leads to computational inefficiency. In addition, it is still unknown whether the 1-norm approaches can succeed to recover the underlying graph edges under attractive Gaussian Markov random fields.

The main contributions of this paper are threefold:

  • We propose a computationally efficient projected Newton-like algorithm to solve the 1-norm regularized Gaussian maximum likelihood estimation under MTP2 constraints. By exploiting the structure of the Gaussian maximum likelihood estimation, the proposed algorithm significantly reduces the computational cost in computing the approximate Newton direction.

  • We prove that the 1-norm regularized Gaussian maximum likelihood estimation method can recover the graph edges correctly under irrepresentability condition.

  • Numerical experiments on synthetic and financial time-series data demonstrate the effectiveness of the proposed algorithm. It is observed that the proposed algorithm takes less computational time than the state-of-the-art methods.

The remainder of the paper is organized as follows. Preliminaries and related work are provided in Section II. We propose a novel algorithm, and present the theoretical results on the successful edge recovery guarantees in Section III. Experimental results are provided in Section IV, and conclusions are made in Section V.

Notations: Lower case bold letters denote vectors and upper case bold letters denote matrices. Both Xij and [X]ij denote the (i,j)-th entry of X. Let ⊗ be the Kronecker product, and supp(X) = {(i,j)|Xij 0}. [p] denotes the set {1,…,p}. X denotes transpose of X. x, XF and X2 denote Euclidean norm, Frobenius norm and operator norm, respectively. |X| denotes the / ℓ-operator norm given by |X|:=maxi=1,,ppj=1|Xij|. Let xmax=maxi|xi| and xmin=mini|xi|. Sp+ and Sp++ denote the sets of positive semi-definite and positive definite matrices with size p × p, respectively.

SECTION II.

Preliminaries and Related Work

In this section, we first introduce the attractive Gaussian Markov random fields, then present related works.

A. Attractive Gaussian Markov Random Fields

We denote an undirected graph by G=(V,E), where V = {1,…,p} is the vertex set, and E is the edge set. Let y=(y1,,yp) be a zero-mean p-dimensional random vector following yN(0,Σ). We associate the random variables y1,...,yp with the vertex set V. Then, the random vector yforms a Gaussian Markov random field with respect to a graph G=(V,E), where

Θij0(i,j)Eij,Θij=0yiyjy[p]{i,j},(1)
View SourceRight-click on figure for MathML and additional features. where Θ := Σ−1 is called precision matrix.

In this paper, we focus on the attractive Gaussian Markov random fields, where the precision matrix Θ is a symmetric Mmatrix [24], i.e., [Θij]0, for any ij. In other words, all the partial correlations are nonnegative in attractive Gaussian Markov random fields.

We aim to estimate a sparse precision matrix under attractive Gaussian Markov random fields given independent and identically distributed observations y(1),…,y(n) ∈ ℝp. Let S=1nni=1y(i)(y(i)) be the sample covariance matrix. The sparse precision matrix estimation under the attractive Gaussian Markov random fields can be formulated as the 1-norm regularized maximum likelihood estimation under the MTP2 constraints,

X=argminXMplogdet(X)+tr(XS)+λij|Xij|,(2)
View SourceRight-click on figure for MathML and additional features. where Mp is the set of all p-dimensional, symmetric, nonsingular M-matrices, which is defined by
Mp:={XSp++Xij0,ij}.(3)
View SourceRight-click on figure for MathML and additional features.

We propose a second-order algorithm in Section III to solve this problem.

B. Related Work

Sparse precision matrix estimation under Gaussian Markov random fields has been extensively studied in the literature. One popular approach is the 1-norm regularized Gaussian maximum likelihood estimation [11], [12], [13], and numerous algorithms have been proposed for solving this problem. A representative, yet not exhaustive, list of works available in the literature include: block coordinate ascent method [11], [25], Nesterov’s smooth gradient method [12], projected gradient method [26], projected quasi-Newton [27], augmented Lagrangian method [28], inexact interior point method [29], primal proximal-point with Newton-CG method [30], and Newton’s method with quadratic approximation [31], [32], [33]. However, all the methods mentioned above cannot be directly extended to estimate precision matrices in our problem because of MTP2 constraints.

To estimate precision matrices under MTP2 constraints, one option is using the Gaussian maximum likelihood estimator [5], [34], and the sign constraint can promote the sparsity of the solution implicitly. To solve this problem, a primal algorithm [2] and a dual algorithm [23] were proposed based on block coordinate descent. The authors in [6] proposed the 1-norm regularized Gaussian maximum likelihood estimation, and updated each column/row of the variable by solving a nonnegative quadratic program. In addition, there is growing interest in learning graphs under the generalized graph Laplacian models [6], [23], [35], where the precision matrices satisfy MTP2 constraints. In this paper, we aim to propose an efficient algorithm for estimating sparse precision matrices under MTP2 constraints, and establish the successful edge recovery guarantees.

SECTION III.

Proposed Algorithm and Theoretical Results

In this section, we first derive a second-order algorithm to solve the 1-norm regularized Gaussian maximum likelihood estimation under MTP2 constraints, then provide theoretical results on successful recovery of graph edges.

A. Proposed Algorithm

To solve Problem (2), we propose a projected Newton-like algorithm. In each iteration, we partition the algorithm variables into two sets, i.e., free and restricted sets, and only update the variables in the free set while fixing the variables in the restricted set. The restricted set of variables is defined by

Ik:={(i,j)[p]2[Xk]ij=0,[f(Xk)]ij<0},(4)
View SourceRight-click on figure for MathML and additional features. where f is the objective function of Problem (2), and the free set is the complement of the restricted set, i.e., Ick.

Now we construct the projected Newton-like step as follows,

Xk+1=PΩ(XkγkPk),(5)
View SourceRight-click on figure for MathML and additional features. where γk is the step size in the k-th iteration, PΩ is the projection onto the set Ω := {X|Xij ≤ 0,∀ij}, and Pkis the approximate Newton direction defined by
pveck(Pk)=Qk pvec k(f(Xk)),(6)
View SourceRight-click on figure for MathML and additional features.
where Qkis the gradient scaling matrix, which is an approximate inverse Hessian. We define pveck(X) as follows,
pveck(X):=[[X]Ick[X]Ik],(7)
View SourceRight-click on figure for MathML and additional features.
where [X]IkR|Ik| and [X]IckR|Ick| denote the two vectors containing all the elements of Xin the sets Ik and Ick, respectively. According to (7), we can see that pveck(X) stacks the elements of Xinto a single vector, which is similar to vec(X), but it places the elements of Xin the sets Ick and Ik in order.

The approximate Newton direction in (6) may involve computing the inverse of the Hessian matrix with the dimension p2 × p2 or solving a system of linear equations of the same dimension, which is computationally challenging. In what follows, we aim to simplify the computation in (6). We construct the gradient scaling matrix Qkas follows,

Qk=[[H1k]IckIck00Dk],
View SourceRight-click on figure for MathML and additional features. where Dkis a positive definite diagonal matrix with the dimension |Ik|×|Ik|, and [H1k]IckIck is a principal submatrix of H1k keeping rows and columns indexed by Ick, in which Hkis the Hessian matrix. By calculation, we obtain Hk=X1kX1k, and thus
H1k=XkXk.(8)
View SourceRight-click on figure for MathML and additional features.

Following from the property of Kronecker product that vec(ABC) = (CA)vec(B), we obtain

H1kvec(f(Xk))=vec(Xkf(Xk)Xk).(9)
View SourceRight-click on figure for MathML and additional features.

As a result, we can get

Qkpveck(f(Xk))=[[H1k]IckIck00Dk][[f(Xk)]Ick[f(Xk)]Ik]=[[XkPIck(f(Xk))Xk]IckDk[f(Xk)]Ik],
View SourceRight-click on figure for MathML and additional features. where PIck(A) is defined as follows,
[PIck(A)]ij={Aij0if(i,j)Ick,otherwise.
View SourceRight-click on figure for MathML and additional features.

Therefore, we have

[Pk]Ick=[XkPIck(f(Xk))Xk]Ick.
View SourceRight-click on figure for MathML and additional features.

In each iteration, we only update the variables in the free set, i.e., [X]Ick, and set the remaining variables in the restricted set to be zero. Finally, we can rewrite the projected Newton-like iteration in (5) as

[Xk+1]Ick=PΩ([Xk]Ickγk[XkPIck(f(Xk))Xk]Ick),
View SourceRight-click on figure for MathML and additional features. and
[Xk+1]Ik=0.
View SourceRight-click on figure for MathML and additional features.

Note that the Newton-type methods for solving our problem (2) are usually computationally expensive, because of the computations of the (approximate) inverse of the Hessian matrices with the dimension p2 × p2. However, it is observed that our constructed iterates as shown above only involves the matrix multiplication with the dimension p × p and gradient computation. The step size can be computed by the Armijo step-size rule.

Algorithm 1 Projected Newton-like method
Table 1- 
Projected Newton-like method

B. Theoretical Results

In this subsection, we provide theoretical results to show that the estimator X defined in (2) can recover the underlying graph edges correctly under irrepresentability condition.

Let S:={(i,j)Θij0} be the support set of the underlying precision matrix Θ, and d be the maximum number of nonzero elements in any row of Θ. Define

KΣ:=maxi{1,,p}j=1pΣij, and KH:=|(HSS)|1,(10)
View SourceRight-click on figure for MathML and additional features. where Σ is the underlying covariance matrix, and HSS is the principle submatrix of H, with both rows and columns indexed by S, in which His the Hessian matrix at Θ.

Assumption 1. There exists some α ∈ (0,1] such that

HScS(HSS)11α.(11)
View SourceRight-click on figure for MathML and additional features.

Assumption 2. The nonzero elements of the underlying precision matrix Θ satisfy

min(i,j)S|Θij|(1+α2)KHλ.(12)
View SourceRight-click on figure for MathML and additional features.

Assumption 1 presents the irrepresentability condition that is almost necessary for the 1-norm approaches to recover the supports correctly [14]. Assumption 2 imposes a lower bound on the minimum absolute value of nonzero elements of Θ.

Theorem 1. Suppose the sample covariance matrix satisfies SΣmaxα4λ. Under Assumptions 1 and 2, if the sample size is lower bounded by ncd2 logp, where c=(3(1+α2)cλKHKΣmax((1+2α)KHK2Σ,1))2, then X obtained by solving Problem (2) with λ=cλlogpn can recover the support of Θ correctly, i.e.,

supp(X)=supp(Θ),(13)
View SourceRight-click on figure for MathML and additional features.

where cλ is a positive constant.

Theorem 1 shows that the support of the minimizer X of Problem (2) is the same with that of the underlying precision matrix Θ under the irrepresentability condition, implying that all the underlying graph edges can be identified correctly through X. In addition, the condition SΣmaxα4λ can hold with overwhelming probability 1 − c1 exp(−c2 logp) for Gaussian observations, where c1 and c2 are two positive constants.

SECTION IV.

Experimental Results

In this section, we present numerical results on synthetic data and financial time-series data to verify the performance of the proposed algorithm. All the experiments are conducted on a PC with a 2.8 GHz Inter Core i7 CPU and 16 GB RAM.

A. Synthetic Data

We consider to learn Barabasi-Albert graphs that are useful in many applications. The graph structure and its weights associated with the edges are generated randomly. We obtain a weighted adjacency matrix W, where Wij denotes the graph weight between node i and node j. We construct the underlying precision matrix by Θ = DW, where Dis a diagonal matrix, which is generated to ensure that Θ is an M-matrix and the irrepresentability condition in Assumption 1 holds.

Given the underlying precision matrix Θ ∈ ℝp×p, we generate n independent observations y(1),,y(n)N(0,Θ1), and compute the sample covariance matrix by S=1nni=1y(i)(y(i)).

We compare the computational time with the state-of-the-art BCD [2], and GGL [6] in solving the 1-norm regularized Gaussian maximum likelihood estimation. It is observed in Table I that the three methods lead to the same objective function value, because all of them can obtain the global minimizer. However, our proposed algorithm takes significantly less computational time than BCD and GGL. The results are averaged over 10 Monte Carlo realizations. In addition, GGL can incorporate connectivity constraints in graph learning.

TABLE I: Comparisons of computational time in learning Barabasi-Albert graphs. The "Objective" denotes the objective function value, and the unit of time is seconds.
Table I:- 
Comparisons of computational time in learning Barabasi-Albert graphs. The "Objective" denotes the objective function value, and the unit of time is seconds.

Next, we show that the estimator defined in (2) can recover the graph edges correctly under irrepresentability condition. We use F-score to measure the performance of edge recovery, which is defined by

 F - score :=2 TP 2 TP + FP + FN ,(14)
View SourceRight-click on figure for MathML and additional features. where TP denotes the number of true positives, FP denotes the number of false positives, and FN denotes the number of false negatives. The F-score takes values in [0],[1], and 1 indicates perfect structure recovery.

It is observed in Figure 1 that our algorithm can achieve the F-score to be 1, implying that the graph structure is identified perfectly under the irrepresentability condition, which is consistent with our theoretical results in Theorem 1. We compare our method with the well-known Glasso method [25], which does not impose the MTP2 constraints. We can observe that our algorithm can obtain a higher F-score than Glasso in the region of small sample size ratios. This is expected because imposing more prior knowledge always helps to improve the estimation performance especially when the samples are limited. The results are averaged over 30 realizations.

Fig. 1: - 
F-score as a function of the sample size ratio of n/p in learning Barabasi-Albert random graphs. The regularization parameter for both methods is set as λ = 0.05.
Fig. 1:

F-score as a function of the sample size ratio of n/p in learning Barabasi-Albert random graphs. The regularization parameter for both methods is set as λ = 0.05.

B. Financial Time-series Data

The MTP2 models are justified well on the stock data since the market factor leads to the positive dependencies among stocks [3]. The data is collected from 189 stocks composing the S&P 500 index during a period from Oct. 1st 2017 to Jan. 1st 2018, resulting in 62 observations per stock, i.e., p = 189 and n = 62. We construct the log-returns data matrix by

Xi,j=logPi,jlogPi1,j,(15)
View SourceRight-click on figure for MathML and additional features. where Pi,j denotes the closing price of the j-th stock on the i-th day. The stocks are categorized into 4 sectors: Information Technology, Industrials, Consumer Staples, and Energy.

It is observed in Figure 2 that the performance of our algorithm is better than Glasso, because the latter has more gray-colored connections, which are between stocks from distinct sectors and often spurious from a practical perspective. The modularity values for Glasso and the proposed method are 0.41 and 0.45, respectively. A higher modularity value indicates a better representation of the actual network of stocks.

Fig. 2: - 
Stock graphs learned via (a) Glasso, and (b) the proposed method.
Fig. 2:

Stock graphs learned via (a) Glasso, and (b) the proposed method.

SECTION V.

Conclusions

In this paper, we have proposed a projected Newton-like algorithm to solve the 1-norm regularized Gaussian maximum likelihood estimation under MTP2 constraints. The proposed algorithm significantly reduces the computational cost in computing the approximate Newton direction. We have proved that our method can recover the graph edges correctly under the irrepresentability condition, which has been verified by numerical results. Experiments have demonstrated that the proposed algorithm takes significantly less computational time than the state-of-the-art methods.

    References

    1.
    M. Denuit, J. Dhaene, M. Goovaerts and R. Kaas, Actuarial theory for dependent risks: measures orders and models, John Wiley & Sons, 2006.
    2.
    M. Slawski and M. Hein, "Estimation of positive definite M-matrices and structure learning for attractive Gaussian Markov random fields", Linear Algebra and its Applications, vol. 473, pp. 145-179, 2015.
    3.
    R. Agrawal, U. Roy and C. Uhler, "Covariance Matrix Estimation under Total Positivity for Portfolio Selection", Journal of Financial Econometrics, 2020.
    4.
    Y. Wang, U. Roy and C. Uhler, "Learning high-dimensional gaussian graphical models under total positivity without adjustment of tuning parameters", International Conference on Artificial Intelligence and Statistics, pp. 2698-2708, 2020.
    5.
    S. Lauritzen, C. Uhler, P. Zwiernik et al., "Maximum likelihood estimation in Gaussian models under total positivity", The Annals of Statistics, vol. 47, no. 4, pp. 1835-1863, 2019.
    6.
    H. E. Egilmez, E. Pavez and A. Ortega, "Graph learning from data under Laplacian and structural constrints", IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 6, pp. 825-841, 2017.
    7.
    X. Dong, D. Thanou, M. Rabbat and P. Frossard, "Learning graphs from data: A signal representation perspective", IEEE Signal Processing Magazine, vol. 36, no. 3, pp. 44-63, 2019.
    8.
    J. V. d. M. Cardoso, J. Ying and D. P. Palomar, "Algorithms for learning graphs in financial markets", arXiv preprint arXiv:2012.15410, 2020.
    9.
    S. Fallat, S. Lauritzen, K. Sadeghi, C. Uhler, N. Wermuth, P. Zwiernik et al., "Total positivity in Markov structures", The Annals of Statistics, vol. 45, no. 3, pp. 1152-1184, 2017.
    10.
    S. Lauritzen, C. Uhler and P. Zwiernik, "Total positivity in exponential families with application to binary variables", The Annals of Statistics, vol. 49, no. 3, pp. 1436-1459, 2021.
    11.
    O. Banerjee, L. E. Ghaoui and A. d’Aspremont, "Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data", Journal of Machine Learning Research, vol. 9, no. Mar, pp. 485-516, 2008.
    12.
    A. d’Aspremont, O. Banerjee and L. El Ghaoui, "First-order methods for sparse covariance selection", SIAM Journal on Matrix Analysis and Applications, vol. 30, no. 1, pp. 56-66, 2008.
    13.
    M. Yuan and Y. Lin, "Model selection and estimation in the Gaussian graphical model", Biometrika, vol. 94, no. 1, pp. 19-35, 2007.
    14.
    P. Ravikumar, M. J. Wainwright, G. Raskutti, B. Yu et al., " High-dimensional covariance estimation by minimizing ℓ 1 -penalized log-determinant divergence ", Electronic Journal of Statistics, vol. 5, pp. 935-980, 2011.
    15.
    R. Mazumder and T. Hastie, "The graphical lasso: New insights and alternatives", Electronic Journal of Statistics, vol. 6, pp. 2125, 2012.
    16.
    C.-J. Hsieh, A. Banerjee, I. S. Dhillon and P. K. Ravikumar, "A divide-and-conquer method for sparse inverse covariance estimation", Advances in Neural Information Processing Systems, pp. 2330-2338, 2012.
    17.
    C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, P. K. Ravikumar and R. Poldrack, "BIG & QUIC: Sparse inverse covariance estimation for a million variables", Advances in Neural Information Processing Systems, pp. 3165-3173, 2013.
    18.
    E. Yang, G. Allen, Z. Liu and P. K. Ravikumar, "Graphical models via generalized linear models", Advances in Neural Information Processing Systems, pp. 1358-1366, 2012.
    19.
    E. Yang, P. Ravikumar, G. I. Allen and Z. Liu, "Graphical models via univariate exponential family distributions", The Journal of Machine Learning Research, vol. 16, no. 1, pp. 3813-3847, 2015.
    20.
    J. Honorio, D. Samaras, I. Rish and G. Cecchi, "Variable selection for Gaussian graphical models", International Conference on Artificial Intelligence and Statistics, pp. 538-546, 2012.
    21.
    C. McCarter and S. Kim, "Large-scale optimization algorithms for sparse conditional Gaussian graphical models", International Conference on Artificial Intelligence and Statistics, pp. 528-537, 2016.
    22.
    J. Chen, P. Xu, L. Wang, J. Ma and Q. Gu, "Covariate adjusted precision matrix estimation via nonconvex optimization", International Conference on Machine Learning, pp. 921-930, 2018.
    23.
    E. Pavez and A. Ortega, "Generalized Laplacian precision matrix estimation for graph signal processing", 2016 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 6350-6354, 2016.
    24.
    E. Bølviken, "Probability inequalities for the multivariate normal with non-negative partial correlations", Scandinavian Journal of Statistics, pp. 49-58, 1982.
    25.
    J. Friedman, T. Hastie and R. Tibshirani, "Sparse inverse covariance estimation with the graphical lasso", Biostatistics, vol. 9, no. 3, pp. 432-441, 2008.
    26.
    J. Duchi, S. Gould and D. Koller, "Projected subgradient methods for learning sparse gaussians", Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, pp. 153-160, 2008.
    27.
    M. Schmidt, E. Berg, M. Friedlander and K. Murphy, "Optimizing costly functions with simple constraints: A limited-memory projected quasi-newton algorithm", Artificial Intelligence and Statistics, pp. 456-463, 2009.
    28.
    K. Scheinberg, S. Ma and D. Goldfarb, "Sparse inverse covariance selection via alternating linearization methods", Advances in Neural Information Processing Systems, pp. 2101-2109, 2010.
    29.
    L. Li and K.-C. Toh, "An inexact interior point method for L1-regularized sparse covariance selection", Mathematical Programming Computation, vol. 2, no. 3-4, pp. 291-315, 2010.
    30.
    C. Wang, D. Sun and K.-C. Toh, "Solving log-determinant optimization problems by a Newton-CG primal proximal point algorithm", SIAM Journal on Optimization, vol. 20, no. 6, pp. 2994-3013, 2010.
    31.
    C. Hsieh, I. S. Dhillon, P. K. Ravikumar and M. A. Sustik, "Sparse inverse covariance matrix estimation using quadratic approximation", Advances in Neural Information Processing Systems, pp. 2330-2338, 2011.
    32.
    F. Oztoprak, J. Nocedal, S. Rennie and P. A. Olsen, "Newton-like methods for sparse inverse covariance estimation", Advances in Neural Information Processing Systems, pp. 755-763, 2012.
    33.
    C.-J. Hsieh, M. A. Sustik, I. S. Dhillon and P. Ravikumar, "QUIC: quadratic approximation for sparse inverse covariance estimation", Journal of Machine Learning Research, vol. 15, no. 1, pp. 2911-2947, 2014.
    34.
    J. A. Soloff, A. Guntuboyina and M. I. Jordan, "Covariance estimation with nonnegative partial correlations", arXiv preprint arXiv:2007.15252, 2020.
    35.
    E. Pavez, H. E. Egilmez and A. Ortega, "Learning graphs with monotone topology properties and multiple connected components", IEEE Transactions on Signal Processing, vol. 66, no. 9, pp. 2399-2413, 2018.