A time-varying complex network with fractional dynamics has node activities over time influenced by its own history and also by activities of the other nodes. The systems with such attributes enables the modeling of coupled non-stationary processes that exhibit long-term memory [1]–[2][3][4][5]. A multitude of examples can be found in several application domains such as biology, and engineering, e.g. bacteria dynamics [6]–[7][8] and swarm robotics [9], [10], respectively. Nonetheless, their applicability becomes even more useful in the context of heterogeneous networks that interact geographically and across different time-scales, as it often occurs in the context of cyber-physical systems (CPS).
In the context of the present manuscript, we are motivated by the recent success of fractional order dynamics in modeling spatiotemporal properties of physiological signals, such as electrocardiogram (ECG), electromyogram (EMG), electroencephalogram (EEG) and blood oxygenation level dependent (BOLD) just to mention a few [11], [12]. Despite these modeling capabilities, there is one main limitation that continues to elude scientists and engineers alike. Specifically, complex networks such as the brain, whose nodes will dynamically evolve using fractal order dynamics, are often observed locally. Meaning that some of the dynamics assessed by the models are not only due to the local interaction, but might be constrained by unknown sources, i.e., stimuli that are external to the considered network. Consequently, we propose a model that enables us to account for the existence of such unknown stimuli, and determine the model that best captures the local dynamics under such stimuli. Observe that this enhances the analysis of these systems once we have an additional feature (i.e., the stimuli) that can be the main driver of a possible abnormal behavior of the network [13].
In the context of neuroscience and medical applications, the ability to determine unknown inputs is crucial in the retrieval of the model under the so-called input artifacts [14] captured in the EEG readings due to a pulse that causes a higher variation of the potential than the baseline. Alternatively, the existence of a model to cope with the presence of unknown inputs enable the modeling of stimuli that are originated in the subcortical structures of the brain that are often neglected in the current EEG and functional magnetic resonance imaging (fMRI) that leverages the BOLD signals [15]. Thus, it is imperative to develop such framework, as well as tools that enable us to perform the analysis and design of the systems modeled by such setup, which we introduce in the current paper.
To the best of the authors' knowledge, such framework has not been advanced in the context of discrete-time fractional order dynamics. In the domain of continuous-time fractional dynamics, works like [16], [17] exist for the design of observer in the presence of unknown inputs. The closest work for the discrete-time case is [18], which does not consider the case of unknown inputs. Nonetheless, the usefulness of accounting for unknown inputs in the context of linear time invariant (LTI) systems is an old topic [19]–[20][21][22][23][24]. Specifically, the closest papers to the results proposed in this paper are as follows: observer with unknown inputs [20], delayed systems with unknown inputs [24], estimation of unknown inputs with sparsity constraints [25]. Notwithstanding, LTI systems are not good approximations for fractional order dynamical systems due to their limited memory capabilities. Yet, due to the numerical approximation of the highly nonlinear infinite dimension fractional order system used, we are able to obtain finite dimension closed-form description that will enable us to derive results alike those attained in the context of LTI systems.
The main contributions of the present paper are threefold: (i) we present an alternating scheme that enables to determine the best estimate of the model parameters and unknown stimuli; and (ii) we provide necessary and sufficient conditions to ensure that it is possible to retrieve the state and unknown stimuli; and (iii) upon these conditions we determine a small subset of variables that need to be measured to ensure that both state and input can be recovered, while establishing suboptimality guarantees with respect to the smallest possible subset.
The remaining of the paper is organized as follows. Section II introduces the model considered in this paper and the main problems studied in this manuscript. Next, in Section III and IV, we present the solution to these problems. Finally, in Section V, we present an illustrative example of the main results using real EEG data from a wearable technology.
SECTION II.
Problem Formulation
In this section, we first describe the time-varying complex network model having fractional order dynamical growth under unknown excitations. Next, upon this model, we propose two main problems regarding analysis and design to be addressed in the present paper.
A. System Model
We consider a linear discrete time fractional-order dynamical model described as follows:
Δαx[k+1] y[k] = Ax[k]+Bu[k]= Cx[k],(1)
View Source
\begin{align*}
\Delta ^{\alpha}x[k+1]\ \ &= \ \ Ax[k]+Bu[k]\\
y[k] \ \ &= \ \ Cx[k],\tag{1}
\end{align*} where
x∈Rn is the state,
u∈Rp is the unknown input and
y∈Rn is the output vector. Also, we can describe the system by its matrices tuple (
α,A,B,C) of appropriate dimensions. In what follows, we assume that the input size is always strictly less than the size of state vector, i.e.,
p<n. The difference between a classic linear time-invariant and the above model is the inclusion of fractional-order derivative whose expansion and discretization
[26] for any
ith state (
1⩽i⩽n) can be written as
Δαixi[k]=∑j=0kψ(αi,j)xi[k−j],(2)
View Source
\begin{equation*}
\Delta ^{\alpha_{i}}x_{i}[k]=\sum\limits_{j=0}^{k}\psi(\alpha_{i}, j)x_{i}[k-j],\tag{2}
\end{equation*} where
αi is the fractional order corresponding to the
ith state and
ψ(αi,j)=Γ(j−αi)Γ(−αi)Γ(j+1) with
Γ(.) denoting the gamma function.
Having defined the system model, the system identification i.e. estimation of model parameters from the given data is an important step. It becomes nontrivial when we have unknown inputs since one has to be able to differentiate which part of the evolution of the complex network is due to its intrinsic dynamics and what is due to the unknown inputs. Subsequently, one of the first problems we need to address is that of system identification from the data, as described next.
B. Data-Driven Model Estimation
The fractional-order dynamical model takes care of long-range memory which often is the property of many physiological signals. The estimation of the spatiotemporal parameters when there are no inputs to the system was addressed in [18]. But as it happens, ignoring the inputs inherently assume that the system is isolated from the external surrounding. Hence, for a model to be able to capture the system dynamics, the inclusion of unknown inputs is necessary. Therefore, the first problem that we consider is as follows.
Problem-1
Given the input coupling matrix B, and measurements of all states across a time horizon [t,t+T−1] of length T, we aim to estimate the model parameters (α,A) and the unknown inputs {u[k]}t+T−2t.
Notice that this would extend the work in [18] to include the case of unknown inputs. In fact, we will see in Section III that the proposed solution would result in a different set of model parameters.
C. Sensor Selection
For the system model described by (1), where the system parameters were determined as part of the solution to the Problem-1, we consider that the output measurements are collected only by a subset of sensors. In numerous applications (for example physiological systems) it happens that the sensors are dedicated, i.e., each sensor capture an individual state [27], so the measurement model can be written as
y[k]=ISx[k],(3)
View Source
\begin{equation*}
y[k]=\mathbb{I}^{S}x[k],\tag{3}
\end{equation*} where
IS is the matrix constructed by selecting rows indexed by set S of the
n×n identity matrix
In. As an example, if all sensors are selected, i.e.,
S=[n]≡{1,2,…,n}, then
IS=In. For selecting the best set of sensors
S, with knowledge of the system matrices and the given observations, we would resort to the constraint of
perfect observability that is defined as follows.
Definition 1
Definition 1 Perfect Observability
A system described by (1) is called perfectly observable if given the system matrices (α,A,B,C) and K observations y[k],0⩽k⩽ K−1, it is possible to recover the initial state x[0] and the unknown inputs {u[k]}K−2k=0.
Subsequently, the second problem that we consider is as follows. Problem-2: Determine the minimum number of sensors S such that the system whose dynamics is captured by (α,A,B,IS) is perfectly observable from the collection of K measurements collected by a sub-collection of S dedicated outputs, i.e.,
minS⊆[n]|S|s.t.(α,A,B,IS) is perfectly observable.(4)
View Source
\begin{gather*}
\min\limits_{S\subseteq[n]}\vert S\vert \\
\mathrm{s}.\mathrm{t}.\quad (\alpha, A, B,\mathbb{I}^{S})\ \text{is perfectly observable}. \tag{4}
\end{gather*}In section IV, we will derive the mathematical formulation in terms of algebraic constraints of the perfect observability, which later be used to obtain a solution to (4).
SECTION III.
Model Estimation
We consider the problem of estimating α, A and inputs {u[k]}t+T−2t from the given limited observations y[k], k=[t,t+T−1], which due to the dedicated nature of sensing mechanism is same as x[k] and under the assumption that the input matrix B is known. The realization of B can be application dependent and is computed separately using experimental data. For the simplicity of notation, let us denote z[k]=Δαx[k+1] with k chosen appropriately. The pre-factors in the summation in (2) grows as ψ(αi,j)∼ O(j−αi−1) and, therefore, for the purpose of computational ease we have limited the summation in (2) to the first J values, where J>0 is sufficiently large. Therefore, zi[k] can be written as
zi[k]=∑j=0J−1ψ(αi,j)x[k+1−j],(5)
View Source
\begin{equation*}
z_{i}[k]= \sum\limits_{j=0}^{J-1}\psi(\alpha_{i}, j)x[k+1-j],\tag{5}
\end{equation*} with the assumption that
x[k],u[k]=0 for
k⩽t−1. Using the above introduced notations and the model definition in
(1), the given observations are written as
z[k]=Ax[k]+Bu[k]+e[k],(6)
View Source
\begin{equation*}
z[k]=Ax[k]+Bu[k]+e[k],\tag{6}
\end{equation*} where
e∼N(0,Σ) is assumed to be Gaussian noise independent across space and time. For simplicity, we have assumed that each noise component has same variance, i.e.,
Σ=σ2I. Also, let us denote the system matrices as
A=[a1,a2,…,an]T and
B=[b1,b2,…,bn]T. The vertical concatenated states and inputs during an arbitrary window of time as
X[t−1,t+T−2]=[x[t−1],x[t],…,x[t+T−2]]T, U[t−1,t+T−2]=[u[t−1],u[t],…,u[t+T−2]]T respectively, and for any
ith state we have
Zi,[t−1,t+T−2]=[zi[t−1],zi[t],…,zi[t+T−2]]T. For the sake of brevity, we would be dropping the time horizon subscript from the above matrices as it is clear from the context.
Since the problem of joint estimation of the different parameters is highly nonlinear, we proceed as follows: (i) we estimate the fractional order α using the wavelet technique described in [28]; and (ii) with α known, the z in (5) can be computed under the additional assumption that the system matrix B is known. Therefore, the problem now reduces to estimate A and the inputs {u[k]}t+T−2t. Towards this goal, we propose the following sequential optimization algorithm similar to an expectation-maximization (EM) algorithm [29]. Briefly, the EM algorithm is used for maximum likelihood estimation (MLE) of parameters subject to hidden variables. Intuitively, in our case, in Algorithm 1, we estimate A in the presence of hidden variables or unknown unknowns {u[k]}t+T−2t. Therefore, the ‘E-step’ is performed to average out the effects of unknown unknowns and obtain an estimate of u, where due to the diversity of solutions, we control the sparsity of the inputs (using the parameter λ′). Subsequently, the ‘M-step’ can then accomplish MLE estimation to obtain an estimate of A. The solution provided in [18] can be related to the proposed technique as follows.
Upon setting {u[k]}t+T−2t=0 in the E-step of the Algorithm 1, M-step would be the same at each iteration. Hence the algorithm stays at the initial point which is the solution in [18]. □
It is worthwhile to mention the following result regarding EM algorithm.
SECTION Algorithm 1:
EM Algorithm
The incomplete data (without unknown unknowns) likelihood P(z,x;A(l)) is non-decreasing after an EM iteration.
Hence, the proposed algorithm being EM (detailed formulation in the Appendix I) has non-decreasing likelihood. Additionally, we have the following result about the incomplete data likelihood.
The incomplete data likelihood P(z,x;A(l)) is bounded at each iteration l.
We can comment about the convergence of the Algorithm 1 as follows.
Using Theorem 1, Proposition 1 and Monotone Convergence Theorem, we can claim that the likelihood P(z,x;A(l)) will converge. □
It should be noted that convergence in likelihood does not always guarantee convergence of the parameters. But as emphasized in [31], from numerical viewpoint the convergence of parameters is not as important as convergence of the likelihood. Also the EM algorithm can converge to saddle points as exemplified in [29].
SECTION IV.
Sensor Selection
Before defining the problem of sensor selection, we review the properties of fractional-order growth system with closed-form expressions for state vectors. Using the expansion of fractional order derivative in (2), we can write the state evolving equation as
x[k+1]=Ax[k]−∑j=1k+1D(α,j)x[k+1−j]+Bu[k],(7)
View Source
\begin{equation*}
x[k+1]=Ax[k]- \sum\limits_{j=1}^{k+1}D(\alpha, j)x[k+1-j]+Bu[k],\tag{7}\end{equation*} where
D(α,j) = diag(ψ(α1,j),ψ(α2,j),…,ψ(αn,j)). Alternatively,
(7) can be written as
x[k+1]=∑j=0kAjx[k−j]+Bu[k],(8)
View Source
\begin{equation*}
x[k+1]= \sum\limits_{j=0}^{k}A_{j}x[k-j]+Bu[k],\tag{8}
\end{equation*} where
A0=A−D(α,1) and
Aj=−D(α,j+1) for
j⩾1. With this definition of
Aj, we can define the matrices
Gk as follows
[32]:
Gk=⎧⎩⎨⎪⎪In∑j=0k−1AjGk−1−jk=0,k⩾1.(9)
View Source
\begin{equation*}
G_{k}=\begin{cases}
I_{n} & k=0,\\
\sum\limits_{j=0}^{k-1}A_{j}G_{k-1-j} & k\geqslant 1.
\end{cases}\tag{9}
\end{equation*}Thus, we can obtain the following result.
The solution to system described by (1) is given by
x[k]=Gkx[0]+∑j=0k−1Gk−1−jBu[j].(10)
View Source
\begin{equation*}
x[k]=G_{k}x[0]+ \sum\limits_{j=0}^{k-1}G_{k-1-j}Bu[j].\tag{10}
\end{equation*}A. System Observability
To achieve perfect observability, i.e., to retrieve the initial state and the unknown inputs, we need system matrices and K observations. While any observation matrix C is sufficient for defining the perfect observability, if we set C=IS as introduced in Section II, then K and S are intertwined. Simply speaking, by increasing K we will have more measurements acquired across time which could be used to compensate the number of measurements at each instance of time ruled by the set S of sensors used.
Given the K observations from k=0 to K−1, we can represent the initial state x[0] and the unknown inputs using (10) by defining the following matrices
Θ=[(CG0)T,(CG1)T,…,(CGK−1)T]T,(11)
View Source
\begin{equation*}
\Theta=[(CG_{0})^{T},(CG_{1})^{T},\ldots,(CG_{K-1})^{T}]^{T},\tag{11}
\end{equation*} and
Ξ=⎡⎣⎢⎢⎢⎢⎢⎢⎢0CG0BCG1B⋮CGK−2B00CG0B⋮CGK−3B⋯⋯⋯⋱000⋮CG0B000⋮0⎤⎦⎥⎥⎥⎥⎥⎥⎥,(12)
View Source
\begin{equation*}
\Xi =\begin{bmatrix}
0 & 0 & \cdots & 0 & 0\\
CG_{0}B & 0 & \cdots & 0 & 0\\
CG_{1}B & CG_{0}B & \cdots & 0 & 0\\
\vdots & \vdots & \ddots & \vdots & \vdots \\
CG_{K-2}B & CG_{K-3}B & & CG_{0}B & 0
\end{bmatrix},\tag{12}
\end{equation*} where
C and
B are the observation and input matrices from
(1) respectively, and
Gk is as defined in
(9). Having
Θ and
Ξ defined and using
(1), we can write the initial state and inputs in terms of the observations as
Y=Θ x[0]+ΞU,(13)
View Source
\begin{equation*}
Y=\Theta\ x[0]+\Xi U,\tag{13}
\end{equation*} where
Y=[y[0]T,y[1]T,…,y[K−1]T]T and
U =[u[0]T,u[1]T,…,u[K−1]T]T.
Using (13) and the Definition 1, a necessary and sufficient condition to attain the perfect observability is obtained as follows.
Proposition 2
The system described by (1) is perfectly observable after K measurements if and only if rank([Θ Ξ])=n+(K−1)p.
Proof
The proof follows from re-writing equation (13) as
Y=[Θ Ξ][x[0]U],(14)
View Source
\begin{equation*}
Y=[\Theta\ \Xi]\begin{bmatrix}x[0]\\ U\end{bmatrix},\tag{14}
\end{equation*} and, therefore,
x[0] and first
K−1 inputs from
U can be recovered if and only if
rank([Θ Ξ])=n+(K−1)p. □
Proposition 2 will be key to formulate the constraints in the sensor selection problem as detailed in the next section.
B. Sensor Selection Problem
Given the system matrices (α,A,B) and first K observations, the problem of sensor selection is defined as selecting the minimum number of sensors such that the system is perfectly observable. It can be mathematically written as
minS⊆[n]|S|s.t.rank([Θ Ξ]|C=IS)=n+(K−1)p(15)
View Source
\begin{equation*}
\min\limits_{S\subseteq[n]}\vert S\vert\qquad
\mathrm{s}.\mathrm{t}.\quad \text{rank} ([\Theta\ \Xi]\vert C=\mathbb{I}^{S})=n+(K-1)p\tag{15}
\end{equation*} where
rank([Θ Ξ]|C=IS) denotes the rank of [
Θ Ξ] matrix when
Θ and
Ξ are constructed from
(11) and
(12) with
C=IS. An analogous problem of sensor selection with no inputs is studied in
[27] and it was shown to be NP-hard; hence,
(15) is at least as computationally challenging since it contains the former as a special case when
B=0.
Consequently, we propose a sub-optimal solution to the above problem, while providing optimality guarantees within constant factor. For the discrete set Ω=[n], a function f:2Ω→R is called submodular if for all sets S,T⊆Ω
f(S∪T)+f(S∩T)⩽f(S)+f(T).(16)
View Source
\begin{equation*}
f(S\cup T)+f(S\cap T)\leqslant f(S)+f(T).\tag{16}
\end{equation*}Also, the marginal of an element a w.r.t. set S is defined as fS(a)=f(S∪a)−f(S). Alternatively, a set function is referred as submodular if and only if it satisfies the diminishing returns property, i.e., fS(a)⩾fT(a) for all S⊆T⊆Ω and a∈Ω∖T [33]. The monotone submodular functions have a remarkable property of performance through greedy selection within constant factor of the optimality [34].
With the motivation to borrow such attributes, let us define a discrete set function f(S) as, f(S)=rank([Θ Ξ]|C=IS).
Theorem 2
The discrete set function f(S) =rank([Θ Ξ]|C=IS) is submodular in S.
Since f is submodular, we will be using a greedy selection of sensors to maximize the rank of [Θ Ξ]; in other words, greedily select sensors such that f(S)=n+(K−1)p. The sensor selection algorithm is described as Algorithm 2.
Lemma 3
The complexity of Algorithm 2 with a total of n sensors and K length time horizon is O(n5K3) i.e polynomial.
Proof
With Ω=[n] the computation of fSG(s) would require at most O(n3K3). The algorithm being forward greedy selection has at most n(n+1)/2 executions and hence the complexity of the algorithm is at most O(n5K3). □
Therefore with Theorem 2 and Lemma 3, the Algorithm 2 provide a sub-optimal solution with optimality guarantees within constant factor to the NP-hard problem (15) in the polynomial order complexity.
Algorithm 2: Greedy Sensor Selection
We demonstrate the application of the results derived in the previous sections on physiological signals. In particular we have taken a 64-channel EEG signal which records the brain activity of 109 subjects. The 64-channel/electrode distribution with the corresponding labels and numbers are shown in Figure 1. The subjects were asked to perform motor and imagery tasks. The data was collected by BCI2000 system with sampling rate of 160Hz [35], [36].
A. Model Parameters Estimation
The parameters of the system model α and A, are estimated by the application of Algorithm 1. The performance of EM algorithm like any iterative algorithm is crucially dependent on its initial conditions. For the considered example of EEG dataset, it was observed that convergence of the algorithm is fast. Further, even a single iteration was sufficient to reach the point of local maxima of the likelihood. This shows that the choice of the initial point for EM algorithm is considerably good. The input coupling matrix B can be easily determined through experiments. The values predicted by the model in comparison with actual data are shown in Figure 2. The one step prediction follows very closely the actual data, but there is small mismatch in the five step prediction. The ratio of square root of mean squared error of the prediction by model with and without inputs [27] is shown in Figure 3 for total of 109 subjects. As observed, the error ratio is less than one-third in the case when unknown inputs is considered. Therefore, the fractional-order dynamical model with unknown inputs fits the EEG data much better than the case of no inputs. In the next part, we will use these estimated parameters to compute the set of sensors for perfect observability.
B. Sensor Selection
The estimated parameters are used to construct Θ and Ξ as written in (11) and (12) for the greedy sensor selection Algorithm 2. Upon the application of Algorithm 2, we found that roughly half of the sensors (35 out of 64) are sufficient enough to retrieve the initial state and unknown inputs uniquely. The selected sensors for a particular subject are marked in the Figure 1. Due to the large size and sparsity of [Θ Ξ] matrix, some values of the initial state may blow up due to the presence of very small eigenvalues. In such cases, we can first remove the unknown inputs from (13) by multiplying both sides by W=I−Ξ(ΞTΞ)−1ΞT, i.e., projecting signals into the orthogonal space of Ξ. We can then enforce the norm constraint of x[0] into the least squares estimation of x[0] or, in other words, use RIDGE regression [37], i.e.,
x[0]=argminx∥WY−WΘx∥22+ϵ∥x∥22,
View Source
\begin{equation*}
x[0]= \arg\min\limits_{x}\Vert WY-W\Theta x\Vert_{2}^{2}+\epsilon\Vert x\Vert_{2}^{2},
\end{equation*} where
ϵ>0.
Upon the knowledge of the initial state, the unknown inputs are recovered in the similar fashion or by enforcing sparsity constraint. Figure 4 shows a comparison between the actual and retrieved initial states when using the sensor set SG as output of the Algorithm 2. The retrieved initial states are close to the actual values provided they are estimated in the presence of numerical errors and sparsity.
The presented experimental results show how the proposed schemes are useful in mapping the complex dynamics of brain in the presence of unknown stimuli. The same framework can be easily applied to the study of other complex time evolving networks such as the physiological dynamics systems, for example BOLD signals, EMG, ECG etc.
In this paper, we introduced the framework of discrete fractional order dynamical systems under unknown inputs. Also, we provided tools to perform analysis and design of such systems. Specifically, for the analysis, we presented an alternating scheme that enables to determine the best estimate of the model parameters and unknown stimuli. Also, we provided necessary and sufficient conditions to ensure that it is possible to retrieve the state and unknown stimuli. Furthermore, we enable the design of sensing capabilities of such systems, and provided a mechanism to determine a small subset of variables that need to be measured to ensure that both state and input can be recovered, while establishing sub-optimality guarantees with respect to the smallest possible subset.
Future research will focus on exploiting the structure of fractional order dynamical systems in the context of multi-case scenarios under quantitative description of the estimation quality of the states and inputs. Such extension will enable to determine the confidence in the model obtained that would permit formal claims about the mechanism underlying in the process under study. Additionally, some of the algorithms need to be improved to be deployed in real-time applications when energy and computational resources are limited.
ACKNOWLEDGMENT
The authors are thankful to the reviewers for their valuable feedback. G.G. and P.B. gratefully acknowledge the support by the U.S. Army Defense Advanced Research Projects Agency (DARPA) under grant no. W911NF-17-1-0076 and DARPA Young Faculty Award under grant no. N66001-17-1-4044.
Appendix I EM Formulation
We present the detailed construction of EM like algorithm in this section. In our formulation, the observed (incomplete) data is x and z while u is the hidden data. therefore the complete data would be (z,x,u). Let us consider, Σ=σ2I and at the lth iteration we denote
u[k]∗=argmaxuP(u|z[k],x[k];A(l)).
View Source
\begin{equation*}
u[k]^{\ast}= \arg\max\limits_{u}\mathbb{P}\left(u\vert z[k], x[k]; A^{(l)}\right).
\end{equation*}We can enforce Laplacian prior for u[k] for sparsity (any other prior could also be used) such that P(u[k])∝exp(−λ∥u[k]∥1). Therefore, u[k]∗ is then derived as
u[k]∗ = argmaxulogP(u|z[k],x[k];A(l))= argmaxlogP(u)+logP(z[k],x[k]|u;A(l))= argmax−12σ2∥z[k]−A(l)x[k]−Bu∥22−λ∥u∥1.
View Source
\begin{align*}
u[k]^{\ast}\ \ &=\ \ \arg \max\limits_{u}\log \mathbb{P}\left(u\vert z[k], x[k]; A^{(l)}\right)\\
&=\ \ \arg\max\log \mathbb{P}(u)+\log \mathbb{P}\left(z[k], x[k]\vert u;A^{(l)}\right)\\
&=\ \ \arg\max - \frac{1}{2\sigma^{2}}\Vert z[k]-A^{(l)}x[k]-Bu\Vert_{2}^{2}-\lambda\Vert u\Vert_{1}.
\end{align*}We have approximated the conditional distribution as P(u[k]|z[k],x[k];A(l)) ≈ 1 l{u[k]=u∗[k]}. In the final step of expectation, we can write
Q(A;A(l))=EA(l)[logPc(z[k],x[k],u[k])|x[k],z[k]]=Eu[k]|z[k],x[k];A(l)[logP(z[k],x[k],u[k];A(l))]=Eu[k]|z[k],x[k];A(l)[logP(z[k],x[k];A(l))]+logP(u[k]|z[k],x[k];A(l))=logP(z[k],x[k];A(l))1 l{u[k]=u∗[k]},
View Source
\begin{align*}
&Q(A;A^{(l)})=\mathbb{E}_{A^{(l)}}[\log \mathbb{P}_{c}(z[k], x[k], u[k])\vert x[k], z[k]]\\
&\qquad=\mathbb{E}_{u[k]\vert z[k], x[k]; A^{(l)}}\left[\log \mathbb{P}\left(z[k], x[k], u[k]; A^{(l)}\right)\right]\\
&\qquad=\mathbb{E}_{u[k]\vert z[k], x[k]; A^{(l)}}\left[\log \mathbb{P}\left(z[k], x[k]; A^{(l)}\right)\right]\\
&\quad\qquad+\log \mathbb{P}\left(u[k]\vert z[k], x[k]; A^{(l)}\right)\\
&\qquad=\log \mathbb{P}\left(z[k], x[k]; A^{(l)}\right)1\!\ \!\!\mathrm{l}_{\left\{u[k]=\overset{\ast}{u}[k]\right\}},
\end{align*} where
Pc is used to signify the likelihood of the complete data. For the Maximization step, we can simply write
A(l+1) = argmaxAQ(A;A(l))= argmaxAlogP(z[k],x[k];A)1 l{u[k]=u∗[k]},
View Source
\begin{align*}
A^{(l+1)}\ \ & =\ \ \arg \max\limits_{A}Q(A;A^{(l)})\\
& =\ \ \arg \max\limits_{A}\log \mathbb{P}(z[k], x[k]; A)1\!\ \!\!\mathrm{l}_{\left\{u[k]=\overset{\ast}{u}[k]\right\}},
\end{align*} or in other words,
a(l+1)i=argmina∥Z~i−Xa∥22,
View Source
\begin{equation*}
a_{i}^{(l+1)}= \arg\min\limits_{a}\Vert \tilde{Z}_{i}-Xa\Vert_{2}^{2},
\end{equation*} where any
kth component of
Z~i is
Z~i,k=zi[k]−u∗[k]Tbi.
Appendix II Proof of Proposition 1
Proof
We show that the likelihood for incomplete (observed) data is bounded at each EM update step. Let us denote the likelihood of the observed data in relation to the parameter A(l) as
P(A(l))=P(z,x;A(l)),(17)
View Source
\begin{equation*}
\mathbb{P}(A^{(l)})=\mathbb{P}(z, x;A^{(l)}),\tag{17}
\end{equation*} which is further written as
P(A(l)) ∝ ∫P(z,x|u;A(l))P(u)du= C∫exp(−12σ2∥z−A(l)x−Bu∥22) exp(−λ∥u∥1)du⩽ C∫exp(−λ∥u∥1)du⩽O(1),
View Source
\begin{align*}
\mathbb{P}(A^{(l)})\ \ & \propto\ \ \int \mathbb{P}(z, x\vert u;A^{(l)})\mathbb{P}(u)du\\
&=\ \ C\int\exp\left(-\frac{1}{2\sigma^{2}}\Vert z-A^{(l)}x-Bu\Vert_{2}^{2}\right)\\
&\ \ \quad\qquad\exp(-\lambda\Vert u\Vert_{1})du\\
&\leqslant\ \ C\int\exp(-\lambda\Vert u\Vert_{1})du\leqslant\mathcal{O}(1),
\end{align*} where
C is the normality constant. Therefore
P(A(l)) is bounded for every iteration index
l⩾0. □
Appendix III Proof of Theorem 2
Proof
For a given n×m matrix A, let R(S) denote the span of rows of A indexed by set S. Let f(S) be the rank of matrix composed by rows indexed by set S, therefore f(S)=|R(S)|. For given Ω={1,2,…,n}, S⊆T⊆Ω and a∈Ω∖T, we can write
|R(T∪a)| = |R(S∪a)|+|R(T∖S)∩R(S∪a)⊥|= |R(S∪a)|+|R(T∖S)∩R(S)⊥∩R(a)⊥|⩽ |R(S∪a)|+|R(T∖S)∩R(S)⊥|= |R(S∪a)|+|R(T)|−|R(S)|
View Source
\begin{align*}
\vert \mathcal{R}(T\cup a)\vert\ \ & =\ \ \vert \mathcal{R}(S\cup a)\vert +\vert \mathcal{R}(T\backslash S)\cap \mathcal{R}(S\cup a)^{\perp}\vert \\
&=\ \ \vert \mathcal{R}(S\cup a)\vert +\vert \mathcal{R}(T\backslash S)\cap \mathcal{R}(S)^{\perp}\cap \mathcal{R}(a)^{\perp}\vert \\
&\leqslant \ \ \vert \mathcal{R}(S\cup a)\vert +\vert \mathcal{R}(T\backslash S)\cap \mathcal{R}(S)^{\perp}\vert \\
&=\ \ \vert \mathcal{R}(S\cup a)\vert +\vert \mathcal{R}(T)\vert -\vert \mathcal{R}(S)\vert
\end{align*} where the last equality is written using the fact that dimension of intersection of
R(T∖S) and orthogonal space of
R(S) are number of linearly independent rows in
R(T) which are not in
R(S) i.e.
|R(T)|−|R(S)|. □