We consider the problem of inferring the conditional independence graph (CIG) of a stationary high-dimensional discrete time process or time series x[n] from observing N samples x[1],…,x[N]. This problem is referred to as graphical model selection (GMS) and of great practical interest, e.g., for gene analysis, econometrics, environmental monitoring and medical diagnosis [1]–[5].
Most of existing work on GMS for time series is based on finite dimensional parametric models [6], [7]. By contrast, a first nonparametric GMS method for high-dimensional time series has been proposed recently [8]. This approach is based on performing neighborhood regression (cf. [9]) in the frequency domain.
In this paper, we present an alternative nonparametric GMS scheme for time series based on generalizing the graphical LASSO (gLASSO) [10]–[12] to stationary time series. The resulting algorithm is implemented using the alternating direction method of multipliers (ADMM) [13], for which we derive closed-form update rules. Typically, ADMM-based methods allow for straightforward distributed implementations, e.g., in wireless sensor networks [13], [14].
While algorithmically our approach is similar to the joint gLASSO proposed in [15], the deployment of a gLASSO type algorithm for GMS of time series is new.
Our main analytical contribution is a performance analysis which yields an upper bound on the probability that our scheme fails to correctly identify the CIG. The effectiveness of our GMS method is also verified numerically.
Notation. Given a natural number F, we define [F]:={1,…,F}. For a square matrix X∈Cp×p, we denote by X¯, XH, tr{X} and |X| its elementwise complex conjugate, its Hermitian transpose, its trace and its determinant, respectively. We also need the matrix norm ∥X∥∞:=maxi,j|Xi,j|. By writing X⪯Y we mean that Y−X is a positive-semidefinite (psd) matrix. A strictly positive definite matrix X is indicated as 0≺X.
We denote by H[F]p the set of all length-F sequences X[⋅]:=(X[1],…,X[F]) with Hermitian matrices X[f]∈Cp×p. For a sequence X[⋅]∈H[F]p, we define ∥Xi,j[⋅]∥2:=(1/F)∑f∈[F]|Xi,j[f]|2, its squared generalized Frobenius norm ∥X[⋅]∥2F:=∑i,j∥Xi,j[⋅]∥2 and its ℓ1-norm as ∥X∥1:=∑i,j∥Xi,j[⋅]∥. We equip the set H[F]p with the inner product ⟨A[⋅],B[⋅]⟩:=(1/F)∑f∈[F]tr{A[f]B[f]}.
For a sequence X[⋅]∈H[F]p and some subset S⊆[p]×[p], we denote by XS[⋅] the matrix sequence which is obtained by zeroing separately for each index f∈[F] all entries of the matrix X[f] except those in S. The generalized support of a sequence X[⋅]∈H[F]p is defined as gsupp(X[⋅]):={(i,j)∈[p]×[p]:(X[f])i,j≠0 for some f∈[F]}. We also use a+:=max{a,0}.
SECTION II.
Problem Formulation
Consider a p-dimensional real-valued zero-mean stationary time series x[n]=(x1[n],…,xp[n])T, for n∈Z. We assume its autocorrelation function (ACF) R[m]:=E{x[m]xT[0]} to be absolutely summable, i.e., ∑∞m=−∞∥R[m]∥<∞, such that we can define the spectral density matrix (SDM) S(θ) via a Fourier transform:
S(θ):=∑m=−∞∞R[m]exp(−j2πmθ),θ∈[0,1).(1)
View Source
{\bf S}(\theta): = \sum\limits_{m = - \infty }^\infty {\bf R}[m]\exp (- j2\pi m\theta), \quad \theta \in [0, 1).\eqno{\hbox{(1)}}We require the eigenvalues of the SDM to be uniformly bounded as
L≤λi(S(θ))≤U,for all θ∈[0,1),(2)
View Source
L \leq {\lambda _i}\left({{\bf S}(\theta)} \right) \leq U, \quad \hbox{for all}\ \theta \in [0,1), \eqno{\hbox{(2)}}where, without loss of generality, we will assume
L=1 in what follows. The upper bound in
(2) is valid if the ACF
R[m] is summable; the lower bound ensures certain Markov properties of the CIG
[3],
[16],
[17].
Our approach is based on the assumption that the SDM is a smooth function. Due to the Fourier relationship (1), this smoothness will be quantified via certain ACF moments
μ(h)x:=∑m=−∞∞∥R[m]∥∞h[m].(3)
View Source
\mu _x^{(h)}: = \sum\limits_{m = - \infty }^\infty \Vert {\bf R}[m]{\Vert _\infty }h[m]. \eqno{\hbox{(3)}}Here,
h[m] denotes a weight function which typically increases with
|m|. For a process with sufficiently small moment
μ(h)x, thereby enforcing smoothness of the SDM, we are allowed to base our considerations on a discretized version of the SDM, given by
S[f]=S(θf), with
θf:=(f−1)/F, for
f∈[F]. The number
F of sampling points is a design parameter which has to be chosen suitably large, compared to the ACF moment
μ(h)x (cf.
[8, Lemma 2.1]).
The CIG of a process x[n] is a simple undirected graph G=(V,E) with node set V=[p]. Each node r∈V represents a single scalar component process xr[n]. An edge between nodes r and r′ is absent, i.e., (r,r′)∉E, if the component processes xr[n] and xr′[n] are conditionally independent given all remaining component processes [3].
If the process x[n] is Gaussian, the CIG can be conveniently characterized via the process inverse SDM K[f]:=S−1[f]. More specifically, it can be shown that, for sufficiently small μ(h)x with h[m]=|m|, [3], [8],
(i,j)∈E⇔(i,j)∈gsupp(K[⋅]).(4)
View Source
(i,j) \in {\cal E} \Leftrightarrow (i,j) \in {\rm gsupp}({\bf K}[\cdot]). \eqno{\hbox{(4)}}Thus, the edge set
E of the CIG is determined by the generalized support of the inverse SDM
K[f], for
f∈[F].
Our goal is to robustly estimate the CIG from a finite length observation, incurring unavoidable estimation errors. Therefore, we have to require that, in addition to (4), the non-zero off-diagonal entries of K[f] are sufficiently large, such that a certain amount of estimation error is tolerable. To this end, we define the process (un-normalized) minimum global partial spectral coherence as
ρx:=min(i,j)∈E∥Ki,j[⋅]∥.
View Source
{\rho _x}: = \mathop {\min }\limits_{(i,j) \in {\cal E}} \Vert {K_{i,j}}[\cdot]\Vert.For the analysis of our GMS scheme we require
Assumption II.1:
We have ρx≥ρmin for a known ρmin>0.
Our approach to GMS in the high-dimensional regime exploits a specific problem structure induced by the assumption that the true CIG is sparse.
Assumption II.2:
The CIG of the observed process x[n] is sparse such that |E|≤s for some small s≪p(p−1)/2.
The performance analysis of the proposed GMS algorithm requires to quantify the conditioning of SDM sub-matrices. In particular, we will use the following assumption, which is a natural extension of the (multitask) compatibility condition, originally introduced in [11] to analyze LASSO for the ordinary sparse linear (multitask) model.
Assumption II.3:
Given a process x[n] whose CIG contains no more than s edges indexed by S⊆[p]×[p], we assume that there exists a positive constant ϕ such that
1F∑f∈[F]vec{X[f]}H(S¯[f]⊗S[f])vec{X[f]}≥ϕs∥XS∥21(5)
View Source
{1 \over F}\sum\limits_{f \in [F]} {\rm vec}{\{{\bf X}[f]\} ^H}(\bar{\bf S}[f] \otimes {\bf S}[f]){\rm vec}\{{\bf X}[f]\} \geq {\phi \over s}\Vert {{\bf X}_{\cal S}}\Vert _1^2\eqno{\hbox{(5)}}holds for all
X[⋅]∈H[F]p with
∥XSc∥1≤3∥XS∥1.
The constant ϕ in (5) is essentially a lower bound on the eigenvalues of small sub-matrices of the SDM. As such, the Assumption II.3 is closely related to the concept of the restricted isometry property (RIP) [18].
As verified easily, Assumption II.3 is always valid with ϕ=L for a process satisfying (2). However, for processes having a sparse CIG, we typically expect ϕ≫L.
SECTION III.
Graphical LASSO for Time Series
The graphical least absolute shrinkage and selection operator (gLASSO) [10]–[12], [19] is an algorithm for estimating the inverse covariance matrix K:=C−1 of a high-dimensional Gaussian random vector x∼N(0,C) based on i.i.d. samples. In particular, gLASSO is based on optimizing a ℓ1-penalized log-likelihood function and can therefore be interpreted as regularized maximum likelihood estimation.
A. Extending Glasso To Stationary Time Series
A natural extension of gLASSO to the case of stationary Gaussian time series is obtained by replacing the objective function for the i.i.d. case (which can be interpreted as a penalized log-likelihood) with the corresponding penalized log-likelihood function for a stationary Gaussian process. However, since the exact likelihood lacks a simple closed-form expression (in terms of the SDM), we will use the “Whittle-approximation” [20], [21], to arrive at the following gLASSO estimator for general stationary time series:
Kˆ[⋅]:=argminX[⋅]∈C−A{X}+⟨Sˆ[⋅],X[⋅]⟩+λ∥X[⋅]∥1(6)
View Source
\widehat {\bf K}[\cdot]: = \mathop{\arg\min}\limits_{{\bf X}[\cdot] \in {\cal C}}- A\{{\bf X}\} + \langle \widehat {\bf S}[\cdot],{\bf X}[\cdot]\rangle + \lambda \Vert {\bf X}[\cdot]{\Vert _1}\eqno{\hbox{(6)}}with
A{X}:=(1/F)∑f∈[F]log|X[f]| and constraint set
C:={X[⋅]∈H[F]p:0≺X[f]⪯I for all f∈[F]}.(7)
View Source
{\cal C}: = \{{\bf X}[\cdot] \in {\cal H}_p^{[F]}:{\bf 0} \prec {\bf X}[f] \preceq {\bf I} \ \hbox{for all}\ f \in [F]\}.\eqno{\hbox{(7)}}This constraint set is reasonable since (i) the function
A{X} is finite only if
0⪯X[f] and (ii) the true inverse SDM satisfies
K[f]⪯I, for all
f∈[F], due to
(2) (with
L=1).
The formulation (6) involves an estimator Sˆ[f] of the SDM values S(θf), for f∈[F]. While in principle any reasonable estimator could be used, we will restrict the choice to a multivariate Blackman-Tukey (BT) estimator [22]
Sˆ[f]=∑m=−N+1N−1w[m]Rˆ[m]exp(−j2πmθf)(8)
View Source
\widehat {\bf S}[f] = \sum\limits_{m = - N + 1}^{N - 1} w[m]\widehat {\bf R}[m]\exp (- j2\pi m{\theta _f})\eqno{\hbox{(8)}}with the standard biased autocorrelation estimate
Rˆ[m]=(1/N)∑Nn=m+1x[n]xT[n−m] for
m=0,…,N−1. Enforcing the symmetry
Rˆ[−m]=RˆH[m], we can obtain the ACF estimate for
m=−N+1,…,−1. The window function
w[m] in
(8) is a design parameter, which can be chosen freely as long as it yields a psd estimate
Sˆ[f]. A sufficient condition such that
Sˆ[f] is guaranteed to be psd is non-negativeness of the Fourier transform
W(θ):=∑∞m=−∞w[m]exp(−j2πθm) [22, Sec. 2.5.2].
The existence of a minimizer in (6) is guaranteed for any choice of λ≥0 as the optimization problem (6) is equivalent to the unconstrained problem minX[⋅]−A{X}+⟨Sˆ[⋅],X[⋅]⟩+λ∥X[⋅]∥1+IC(X[⋅]), where IC(X[⋅]) is the indicator function of the constraint set C. Existence of a minimizer of this equivalent problem is guaranteed by [23, Theorem 27.2]: The objective function is a closed proper convex function and is finite only on the bounded set C (cf. (7)), which trivially implies that the objective function has no direction of recession [23].
We will present in Section III-C a specific choice for the tuning parameter λ in (6), which ensures that the gLASSO estimator Kˆ[⋅] is accurate, i.e., the estimation error Δ[⋅]:=Kˆ[⋅]−K[⋅] is small. Based on the gLASSO (6), an estimate for the edge set of the CIG may then be obtained by thresholding:
Eˆ(η):={(i,j):∥Kˆi,j[⋅]∥≥η}.(9)
View Source
\widehat {\cal E}(\eta): = \{(i,j):\Vert {\widehat K_{i,j}}[\cdot] \Vert \geq \eta \}.\eqno{\hbox{(9)}}Obviously, under Assumption II.1, if
∥Δ∥1≤ρmin/2,(10)
View Source
\Vert {\Deltab}{\Vert _1} \le {\rho _{{\min}}}/2,\eqno{\hbox{(10)}}we have
Eˆ(ρmin/2)=E, i.e., the CIG is recovered perfectly.
B. ADMM Implementation
An efficient numerical method for solving convex optimization problems of the type (6) is the alternating direction method of multipliers (ADMM). Defining the augmented Lagrangian [13] of the problem (6) as
Lρ(X[⋅],Z[⋅],U[⋅]):=−A{X}+⟨Sˆ[⋅],X[⋅]⟩+λ∥Z∥1+(ρ/2)∥X[⋅]−Z[⋅]+U[⋅]∥2F,
View Source
\eqalign{{L_\rho }({\bf X}[\cdot],{\bf Z}[\cdot],{\bf U}[\cdot]):& = - A\{{\bf X}\} + \langle \widehat {\bf S}[\cdot],{\bf X}[\cdot]\rangle + \lambda \Vert {\bf Z}{\Vert _1}\cr&\quad + (\rho /2)\Vert {\bf X}[\cdot] - {\bf Z}[\cdot] + {\bf U}[\cdot]\Vert _{\rm F}^2,}the (scaled) ADMM method iterates, starting with arbitrary initializations for
X(0)[⋅],
Z(0)[⋅] and
U(0)[⋅], the following update rules
X(k+1)[⋅]Z(k+1)[⋅]U(k+1)[⋅]=argminX[⋅]∈CLρ(X[⋅],Z(k)[⋅],U(k)[⋅])=argminZ[⋅]∈H[F]pLρ(X(k+1)[⋅],Z[⋅],U(k)[⋅])=U(k)[⋅]+(X(k+1)[⋅]−Z(k+1)[⋅]).(11)(12)(13)
View Source
\eqalignno{{{\bf X}^{(k + 1)}}[\cdot] &= \mathop{\arg\min}\limits_{{\bf X}[\cdot] \in {\cal C}}{L_\rho }({\bf X}[\cdot],{{\bf Z}^{(k)}}[\cdot],{{\bf U}^{(k)}}[\cdot])&\hbox{(11)}\cr{{\bf Z}^{(k + 1)}}[\cdot] &= \mathop{\arg\min}\limits_{{\bf Z}[\cdot] \in {\cal H}_p^{[F]}}{L_\rho }({{\bf X}^{(k + 1)}}[\cdot],{\bf Z}[\cdot],{{\bf U}^{(k)}}[\cdot])&\hbox{(12)}\cr{{\bf U}^{(k + 1)}}[\cdot] &= {{\bf U}^{(k)}}[\cdot] + ({{\bf X}^{(k + 1)}}[\cdot] - {{\bf Z}^{(k + 1)}}[\cdot]).&\hbox{(13)}}It can be shown (cf.
[13, Sec. 3.1]) that for any
ρ>0, the iterates
X(k)[⋅] converge to a solution of
(6) i.e.,
limk→∞X(k)[⋅]=Kˆ[⋅]. Thus, while the precise choice for
ρ has some influence on the convergence speed of ADMM
[13, Sec. 3.4.1], it is not very crucial. We used
ρ=100 in all of our experiments (cf.
Section IV).
Closed forms for updates (11) and (12) are stated in
Proposition III.1:
Let us denote the eigenvalue decomposition of the matrix Sˆ[f]+ρ(U(k)[f]−Z(k)[f]) by VfDfVHf with the diagonal matrix Df composed of the eigenvalues df,i, sorted non-increasingly. Then, the ADMM update rule (11) is accomplished by setting, separately for each f∈[F],
X(k+1)[f]=VfD˜fVHf(14)
View Source
{{\bf X}^{(k + 1)}}[f] = {{\bf V}_f}{\widetilde {\bf D}_f}{\bf V}_f^H\eqno{\hbox{(14)}}with the diagonal matrix
D˜f whose
ith diagonal element is given by
d~f,i=min{(1/(2ρ))[−df,i+d2f,i+4ρ−−−−−−−√],1}.
If we define the block-thresholding operator W[⋅]=Sκ(Y[⋅]) via Wi,j[f]:=(1−κ/∥Yi,j[⋅]∥)+Yi,j[f], the update rule (12) results in
Z(k+1)[⋅]=Sλ/ρ(X(k+1)[⋅]+U(k)[⋅]).(15)
View Source
{{\bf Z}^{(k + 1)}}[\cdot] = {{\cal S}_{\lambda /\rho}}({{\bf X}^{(k + 1)}}[\cdot] + {{\bf U}^{(k)}}[\cdot]).\eqno{\hbox{(15)}}Proof:
Since the minimization problem (12) is equivalent to the ADMM update for a group LASSO problem [13, Sec. 6.4.2], the explicit form (15) follows from the derivation in [13, Sec. 6.4.2].
For the verification of (14), note that the optimization problem (11) splits into F separate subproblems, one for each f∈[F]. The subproblem for a specific frequency bin f is (omitting the index f)
min0≺X⪯I−log|X|+⟨X,Sˆ+ρ(U(k)−Z(k))⟩+(ρ/2)∥X∥2F.(16)
View Source
\mathop {\min }\limits_{{\bf 0} \prec {\bf X} \preceq {\bf I}} - \log \vert{\bf X}\vert + \langle {\bf X},\widehat {\bf S} + \rho ({{\bf U}^{(k)}} - {{\bf Z}^{(k)}})\rangle + (\rho /2)\Vert {\bf X}\Vert _{\rm F}^2.\eqno{\hbox{(16)}}Let us denote the non-increasing eigenvalues of the Hermitian matrices
X and
Y:=Sˆ+ρ(U(k)−Z(k)) by
xi and
di, for
i∈[p], respectively. According to
[24, Lemma II.1], we have the trace inequality
⟨X,Y⟩≥∑i∈[p]xidp−i−1 with equality if
X is of the form
X=Vdiag{dp−i−1}VH with a unitary matrix
V whose columns are eigenvectors of
Y. Due to this trace inequality, a lower bound on
(16) is
min0≤xi≤1∑i∈[p]−logxi+xidp−i+1+(ρ/2)x2i.(17)
View Source
\mathop {\min }\limits_{0 \le {x_i} \leq 1} \sum\limits_{i \in [p]} - \log {x_i} + {x_i}{d_{p - i + 1}} + (\rho/2)x_i^2.\eqno{\hbox{(17)}}The minimum in
(17) is achieved by the choice
x~i=h(dp−i+1) with
h(z):=min{(−z+z2+4ρ−−−−−−√)/2ρ,1}. However, for the choice
X=Vdiag{x~i}VH (which is
(14)), the objective function in
(16) achieves the lower bound
(17), certifying optimality.
□
We summarize our GMS method in
Algorithm 1
Given samples x[1],…,x[N], parameters T, F, η, λ and window function w[m] perform the steps:
Step 1)
For each f∈[F], compute the SDM estimate Sˆ[f] according to (8).
Step 2)
Approximate the gLASSO Kˆ[⋅] (cf. (6)) by iterating (14), (15) and (13) for a fixed number T yielding X(T)[⋅].
Step 3)
Estimate the edge set via Eˆ(η)={(i,j):∥X(T)i,j[⋅]∥≥η}.
C. Performance Analysis
Let us for simplicity assume that the ADMM iterates X(k)[⋅] converged perfectly to the gLASSO estimate Kˆ[⋅] given by (6), i.e., X(T)[⋅]=Kˆ[⋅]. Then, under Assumption II.1, a sufficient condition for our GMS method to recover the edge set of the CIG G is (10).
We will now derive an upper bound on the probability that (10) fails to hold. This will be accomplished by (i) showing that the norm ∥Δ∥1 can be bounded in terms of the SDM estimation error E[f]:=S[f]−Sˆ[f] and (ii) controlling the probability that the error E[f] is sufficiently small.
Upper bounding ∥Δ∥1. By definition of Kˆ[⋅] (cf. (6)),
−A{Kˆ}+⟨Δ[⋅],Sˆ[⋅]⟩+λ(∥Kˆ∥1−∥K∥1)≤−A{K}.(18)
View Source
- A\{{{\widehat {\bf K}}}\} + \langle {\Deltab}[\cdot],\widehat {\bf S}[\cdot]\rangle + \lambda (\Vert \widehat {\bf K}{\Vert _1} - \Vert {\bf K}{\Vert _1}) \leq - A\{{\bf K}\}.\eqno{\hbox{(18)}}Combining with
argminX[⋅]∈C−A{X}+⟨X[⋅],S[⋅]⟩=K[⋅],
λ∥Kˆ∥1≤λ∥K∥1+⟨Δ[⋅],E[⋅]⟩.(19)
View Source
\lambda \Vert\widehat {\bf K}{\Vert _1} \leq \lambda \Vert {\bf K}{\Vert _1} + \langle {\Deltab}[\cdot],{\bf E}[\cdot]\rangle.\eqno{\hbox{(19)}}Let us, for the moment, assume validity of the condition
E:=maxf∈[F]∥E[f]∥∞≤λ/2,(20)
View Source
E: = \mathop {\max }\limits_{f \in [F]} \Vert {\bf E}[f]{\Vert _\infty } \leq \lambda /2,\eqno{\hbox{(20)}}implying
|⟨Δ[⋅],E[⋅]⟩|≤(λ/2)∥Δ∥1 and, in turn via
(19),
λ∥Kˆ∥1≤λ∥K∥1+(λ/2)∥Δ∥1.(21)
View Source
\lambda \Vert \widehat {\bf K}{\Vert _1} \leq \lambda \Vert {\bf K}{\Vert _1} + (\lambda /2)\Vert {\Deltab}{\Vert _1}.\eqno{\hbox{(21)}}Below, we present a specific choice for the tuning parameter
λ in
(6) such that
(20) holds with high probability. Using
∥Kˆ∥1=(⋆)∥KˆSc∥1+∥ΔS+KS∥1 and
∥Δ∥1=∥ΔS∥1+∥ΔSc∥1 and the (reverse) triangle inequality,
(21) yields
∥ΔSc∥1=(⋆)∥KˆSc∥1≤3∥ΔS∥1,(22)
View Source
\Vert {{\Deltab}_{{{\cal S}^c}}}{\Vert _1}{\buildrel {(\star)} \over{=}} \Vert {\widehat {\bf K}_{{{\cal S}^c}}}{\Vert _1} \leq 3\Vert {{\Deltab}_{\cal S}}{\Vert _1},\eqno{\hbox{(22)}}where
(⋆) is due to
S=gsupp(K[⋅]). Thus, the estimation error
Δ[⋅] tends to be sparse.
As a next step, we rewrite (18) as
−(A{Kˆ}−A{K})+⟨Δ[⋅],S[⋅]⟩≤(3λ/2)∥Δ∥1(23)
View Source
- (A\{\widehat {\bf K}\} - A\{{\bf K}\}) + \langle {\Deltab}[\cdot],{\bf S}[\cdot]\rangle \leq (3\lambda /2)\Vert {\Deltab}{\Vert _1}\eqno{\hbox{(23)}}where we used
(20). Let us denote the eigenvalues of the psd matrix
Γ[f]:=S1/2[f]Δ[f]S1/2[f] by
γf,i. We can then rewrite the LHS of
(23) as
−(A{Kˆ}−A{K})+⟨Δ[⋅],S[⋅]⟩=1F∑f,i−log(1+γf,i)+γf,i.
View Source
- (A \{\widehat{\bf K}\} - A\{{\bf K}\}) + \langle {\Deltab}[\cdot],{\bf S}[\cdot]\rangle = {1 \over F}\sum\limits_{f,i} - \log (1 + {\gamma _{f,i}}) + {\gamma _{f,i}}.Since
Kˆ[f],K[f]⪯I and
S[f]⪯UI (cf.
(7),
(2)), we have
γf,i≤2U. Using
log(1+x)=x−x22(1+εx)2, with some
ε∈[0,1], we further obtain
−(A{Kˆ}−A{K})+⟨Δ[⋅],S[⋅]⟩≥∥Γ[⋅]∥2F/(4(2U+1)2).
View Source
- (A \{\widehat{\bf K}\} - A\{{\bf K}\}) + \langle {\Deltab}[\cdot],{\bf S}[\cdot]\rangle \geq \Vert {\Gammab}[\cdot]\Vert _{\rm F}^2/(4(2U + {1)^2}).Applying
[25, Lemma 4.3.1(b)] to the RHS,
−(A{Kˆ}−A{K})+⟨Δ[⋅],S[⋅]⟩≥14F(2U+1)2∑f∈[F]vec{Δ[f]}H(S¯[f]⊗S[f])vec{Δ[f]}.(24)
View Source
\eqalignno{&- (A\{\widehat {\bf K}\} - A\{{\bf K}\}) + \langle {\Deltab}[\cdot],{\bf S}[\cdot]\rangle &\hbox{(24)}\cr&\geq {1 \over {4F{{(2U + 1)}^2}}} \sum\limits_{f \in [F]} {\rm vec}{\{{\Deltab}[f]\} ^H}(\bar{\bf S}[f] \otimes {\bf S}[f]){\rm vec}\{{\Deltab}[f]\}.}Combining
(24) with
(23) and Assumption II.3 (which can be invoked due to
(22)), we arrive at
∥Δ∥1≤96(2U+1)2(s/ϕ)λ.(25)
View Source
\Vert {\Deltab}{\Vert _1} \leq 96(2U + {1)^2}(s/\phi)\lambda.\eqno{\hbox{(25)}}Controlling the SDM estimation error. It remains to control the probability that condition
(20) is valid, i.e., the SDM estimation error
E[f] incurred by the BT estimator
(8) is sufficiently small. According to
[26, Lemma IV.4], for any
ν∈[0,1/2),
P{E≥ν+μ(h1)x}≤2e−(1/32)Nν2∥w[⋅]∥21U2+2log(2pN)(26)
View Source
P\{E \geq \nu + \mu _x^{({h_1})}\} \leq 2{e^{- {{(1/32)N{\nu ^2}} \over {\Vert w[\cdot]\Vert _1^2{U^2}}} + 2\log (2pN)}}\eqno{\hbox{(26)}}where
μ(h1)x is the ACF moment
(3) with
h1[m]:=|1−w[m](1−|m|/N)| for
|m|<N and
h1[m]:=1 else.
The main result. Recall that a sufficient condition for Eˆ(ρmin/2), given by (9), to coincide with the true edge set is (10). Under the condition (20), implying validity of (25), the inequality (10) will be satisfied if λ is chosen as
λ=ϕρmin/(192s(2U+1)2).(27)
View Source
\lambda = \phi {\rho _{{\min}}}/(192s{(2U + 1)^2}).\eqno{\hbox{(27)}}Using
(26) to bound the probability for
(20) to hold yields
Proposition III.2:
Consider a stationary Gaussian zero-mean time series x[n] satisfying (2) and Assumption II.1-II.3. Then, using the choice (27) in (6) and if the conditions
μ(h1)xN/log(4Np2/δ)≤ϕρmin/(384s(2U+1)2)≥108(2U+1)6s2ϕ−2ρ−2min∥w[⋅]∥21(28)(29)
View Source
\eqalignno{\mu _x^{({h_1})} &\leq \phi {\rho _{{\min}}}/(384s{(2U + 1)^2})&\hbox{(28)}\cr N/\log (4N{p^2}/\delta)& \geq {10^8}{(2U + 1)^6}{s^2}{\phi ^{- 2}}\rho _{{\min}}^{- 2}\Vert w[\cdot]\Vert _1^2&\hbox{(29)}}are satisfied, we have
P{Eˆ(ρmin/2)≠E}≤δ.
In order to satisfy the condition (28), the window function w[⋅] in (8) has to be chosen as the indicator function for the effective support of the ACF R[m]. Thus, the factor ∥w[⋅]∥21 in (29) corresponds to a scaling of the sample size with the square of the effective ACF width. Moreover, the sufficient condition (29) scales inversely with the square of the minimum partial spectral coherence ρmin which agrees with the scaling of the sufficient condition obtained for the neighborhood regression approach in [8]. Note that, while our performance analysis applies only to Gaussian time series, the GMS method in Algorithm I can also be applied to non-Gaussian time series. However, for a non-Gaussian time series the resulting edge estimate Eˆ(η) (cf. (9)) is then not related to a CIG anymore but to a partial correlation graph [3].
SECTION IV.
Numerical Results
We generated a Gaussian time series x[n] of dimension p=64 by applying a finite impulse response filter g[n] of length 2 to a zero-mean, stationary, white, Gaussian noise process e[n]∼N(0,C0). We choose the covariance matrix C0 such that the resulting CIG G=([p],E) is a star graph containing a hub node with |E|=4 neighbors. The corresponding precision matrix K0=C−10 has main diagonal entries equal to 0.5 and off diagonal entries equal to 0.1. The filter coefficients g[n] are such that the magnitude of the associated transfer function is uniformly bounded from above and below by positive constants, thereby ensuring that conditions (2) and (4) are satisfied with L=1, U=10 and F=4. Thus, the generated time series satisfies Assumption II.1 - II.3 with ρmin=0.1∑∞m=−∞r2g[m]−−−−−−−−−−−√, s=4 and ϕ=1. Here, rg[m]=∑∞n=−∞g[n+m]g[n] denotes the autocorrelation sequence of g[n].
Based on N∈{128,256} observed samples, we estimated the edge set of the CIG using Algorithm 1 with L=10 ADMM iterations, F=4 frequency points and window function w[m]=exp(−m2). The gLASSO parameter λ (cf. (6) and (27)) was varied in the range [c1,c2]×ϕρmin/(192s(2U+1)2), where constants c1 and c2 have been tuned empirically.
In Fig. 1, we show receiver operating characteristic (ROC) curves with the empirical false alarm rate Pfa:=1M∑i∈[M]|Eˆi∖E|p(p−1)/2−|E| and the empirical detection probability Pd:=1M∑i∈[M]|Eˆi∩E||E|, both averaged over M=100 independent simulation runs. Here, Eˆi denotes the edge estimate in the i-th simulation run.