TD4ΒΆ

TD4 module performs temporal decorrelation of fourth order on time-delayed cumulant matrices

In the interest of resolving spatial and temporal anharmonic dependencies in the molecular simulation trajectories, we have designed the TD4 module which performs joint diagonalization of time-delayed cumulant matrices (a tensor of fourth-order time-delayed statistics signifying kurtosis). TD4 is the counterpart of SD4, where fourth-order spatial correlations are minimized, implying zero time lag.

Conceptually, the assumption we make is that a molecular simulation trajectory is a linear combination of independent, anharmonically fluctuating protein motions. To discover these anharmonic motions, we borrow a technique from signal processing literature, called Blind Source Separation (BSS), which attempts to extract or unmix independent non-Gaussian sources from signal mixtures with Gaussian noise. To facilitate the extraction of anharmonic modes of motion of the fourth-order, the trajectory data \(X_{orig} \in \mathbb{R}^{3N \times t}\), where 3N represents (x,y,z) coordinates from individual atom selections and t represents conformations is decorrelated for second-order dependencies both spatially and temporally by transforming it through the modules of SD2 and TD2. SD2 module removes dominant second order spatial correlations by computing a spatial covariance matrix and performing principal component analysis (PCA). The function diagonalizes covariance matrix to obtain the projection matrix \(Y = B^TX_{orig} (m \times t)\), where m is subspace dimensionality and \(B (3N \times m)\) are the dominant eigenvectors. Consequently, TD2 module removes dominant second order temporal correlations by computing a time-delayed (specified by a lag time \(\tau\)) covariance matrix and performing PCA. A matrix Z is obtained by projecting the spatially resolved data matrix Y onto the dominant eigenvectors \(B_{TD2}\). The matrix Z then undergoes transformations to retrieve mutually independent signals by obtaining a separating matrix \(W \in \mathbb{R}^{m \times 3N}\).

Algorithmically, the method of unmixing temporally correlated signals of fourth-order can be viewed as a symmetric eigenvalue problem of a generalized cumulant matrix \(Q_{ij}\). As a measure of statistical independence, we will consider the ‘diagonality’ of a set of cumulant matrices. The cumulant matrices are generated in a low-dimensional subspace denoted by m, which is the best guess for the most compact summary of the fourth-order statistics. The subspace dimensionality can be adjusted by examining the inflection points in the cumulative variance plots generated from SD2 module.

In order to generate the cumulant matrices, a time-lagged covariance matrix is defined by:

\[R_z{(\tau)} = E\left\{ZZ_{\tau}^T\right\},\]

where \(Z \in \mathbb{R}^{m \times t}\) is second-order spatially and temporally resolved molecular simulation data, \(\tau\) is time delay and \(Z_{\tau} = Z(t-\tau)\) is the time-lagged version of Z. A fourth-order cumulant matrix \(Q_{ij}\) of this data matrix Z is defined by:

\[Q_{ij} = E\left\{ZZ^TZ_{\tau}^TZ_{\tau}\right\} - E\left\{ZZ^T\right\} \textrm{tr}\, E\left\{Z_{\tau}Z_{\tau}^T\right\} -2E\left\{ZZ_{\tau}^T\right\}E\left\{Z_{\tau}Z^T\right\},\]

where \(Q_{ij} \in \mathbb{R}^{m \times m}\) computes a time-lagged cumulant matrix. The possibility of computational errors, such as round-off errors, can destroy the symmetricity of the cumulant matrix which is restored by performing:

\[Q_{ij} = \frac{1}{2} \left[Q_{ij} + Q_{ij}^T\right].\]

A time-lagged cumulant tensor \(\mathbb{Q} \in \mathbb{R}^{m \times (m \times k)}\), where k = \([{m \times (m+1)}]/2\) is defined for the storage of cumulant matrices computed by the symmetric \(Q_{ij}\) matrix. Joint diagonalization of these time-lagged cumulant matrices reduces fourth-order temporal dependencies leading to anharmonic modes of motion of the trajectory data. This is done through Jacobi’s iterative method of finding solution to a system of linear equations. In particular, the method uses successive transformations to calculate diagonal elements of the cumulant tensor by decimating off-diagonal elements with each iteration. The spatio-temporally decorrelated matrix of fourth-order is computed by:

\[{Z_{TD4}} = W X_{orig},\]

where W attempts to separate sources from signal mixture \(X_{orig}\) by finding directions, such that projections onto these directions have maximum statistical independence. The computed parameter \(Z_{TD4}\) is fourth-order spatially and temporally resolved matrix.

Parameters

Z       - an mxT spatially uncorrelated of order 2 and temporally uncorrelated of order 2 matrix (m subspaces, T samples). May be a numpyarray or matrix where,

m       - dimensionality of the subspace we are interested in. Defaults to None, in which case m=n.

T       - number of snapshots of MD trajectory

V       - separating matrix obtained after doing the PCA analysis on m components of real data followed temporal decorrelation of the spatially whitened data

lag     - lag time in the form of an integer denoting the time steps

verbose - print information on progress. Default is true.

Returns

W - a separating matrix obtained from resolving fourth order temporal correlations

Reference

  1. Georgiev, P., & Cichocki, A. (2003). Robust independent component analysis via time-delayed cumulant functions. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 86(3), 573-579.